Introduction

Lung cancer has been the most common death causing cancer in the world for several decades. Regardless of tremendous efforts, long-term survival has not improved significantly over the last 25 years. 5-Year survival rates of lung cancer patient remain only 15%1, which may increase up to 80%, if the lung cancer is detected in early stages2. According to the International Agency for Research on Cancer (IARC) for 2012 report, one of the most frequent cancers in the world is lung cancer which has the highest incidence rate worldwide (1.8 million, 13% of the total). As far as the mortality rate is concern, lung cancer is again at the top (1.6 million, 19.4% of the total)3. Several studies have been conducted on molecular biomarkers for the early detection of lung cancer at genomics, epigenomics, proteomics and metabolomics levels4,5,6,7 to reduce their mortality rate. Metabolomics in the post-genomic era is a powerful tool for profiling differences in metabolites among normal, precancerous and cancerous cells or tissues. Moreover, metabolomics has gained considerable importance due to recent advances in experimental methodologies and technologies and ability to process large amounts of data. Based on this, metabolomics approaches can permit early diagnosis or real-time monitoring of the effects of a disease8.

The metabolic studies of lung cancer in human tissues and biofluids have been reported in the last few years. Kenjiro Kami et al., have reported metabolomic profiling of lung and prostate tumor tissues by Capillary Electrophoresis Mass Spectrometry (CE-MS)9. Rocha et al., have studied the metabolic differentiation between tumor and non-involved adjacent lung tissues by High Resolution Magical Angle Spinning Nuclear Magnetic Resonance (HRMAS-NMR) spectroscopy10. They investigated increased levels of lactate, phosphocholine (PC) and glycerophosphocholine (GPC) in tumors, while glucose, myo-inositol, inosine/adenosine and acetate level were decreased. Carrola et al, investigated the Nuclear Magnetic Resonance (NMR) based metabonomics in blood plasma and urine11 for metabolic signatures in lung cancer. Using a more global profiling approach, Jordan and colleagues reported the NMR analysis of paired tissues and serum samples from 14 subjects with two different lung cancer histological types (adenocarcinoma and squamous cell carcinoma), as well as of serum from 7 healthy individuals12. In another pubilcation, a panel of 8 metabolites were identified for the diagnosis of breast, lung, colon or prostate cancers with a high sensitivity and specificity13.

A few targeted metabolic profiling of blood plasma/serum have been reported for lung cancer biomarkers discovery. Maeda and co-workers reported the differences in the amino acid profiling of plasma between healthy controls and non-small-cell lung cancer (NSCLC) patients, as assessed by Liquid Chromatography Mass Spectrometry (LC/MS)14. Targeted analysis of lysophosphatidylcholines (lysoPC) showed that irregular levels of lysoPC isomers with different fatty acyl positions were found in the plasma of lung cancer patients as compared to controls15. In another targeted analysis, serum lipid metabolite profiling of 58 lung cancer using Fourier transform ion cyclotron resonance MS has been reported16.

Recent advances in NMR, GC-MS and LC/MS techniques have enabled the use of more global metabolomic approaches for the identification of novel biomarkers for specific diseases7,17,18 as well as new targets for drug discovery and development. Among the recent techniques, GC-MS proved to be a significantly useful method due to its high sensitivity and resolution, reproducibility and cost effectiveness. Moreover, in comparison to LC/MS, the availability of a large GC-MS electron impact (EI) spectral library further aids the identification of biomarkers in various pathological condition19. There are few reports published based on GC-MS analysis of lung cancer metabolites. Metabolites in serum and urine of 19 lung cancer patients and 15 patients with other lung diseases were analyzed using GC-MS20. Serum metabolomic analysis of lung cancer patients was performed using GC-MS from 29 healthy volunteers and 33 lung cancer patients7. Few studies on GC-MS based volatile organic compounds (VOC) as lung cancer biomarkers have also been reported21,22,23,24,25.

In all above cited investigations, either limited numbers of samples were used or one healthy control group was used to discriminate lung cancer metabolites. In the present study, we have used 384 samples with three control groups including healthy non-smokers, smokers and persons with COPD in order to identify diseases related metabolites through comprehensive comparison. Previously, we have developed a comprehensive, straightforward, reproducible and efficient sample preparation method which can cover a wide range of metabolites for metabolite profiling with 2D-C18 fractionation approach26. In this investigation, all the samples were analyzed through 2D-C18 method for the first time to investigate differentiative metabolite patterns between the lung cancer and controls, followed by chemometric analyses.

Methods

Solvents and reagents

All solvents used for GC-MS analysis were of analytical grade. Methanol, hexane and ammonium hydroxide were purchased from Tedia (Tedia way, Fairfield, USA), while isopropanol and hydrochloric acid (37%) were purchased from Fisher Scientific (Loughborough, Leicestershire, U.K.), formic acid and myristic-d27 acid were purchased from Sigma-Aldrich (St. Louis, MO, USA, respectively). MSTFA (N-Methyl-N- (trimethylsilyl) trifluoroacetamide) and methoxylamine hydrochloric were purchased from Acros Organic (New Jersey, USA). Deionized water (Milli-Q) was used throughout the study (Millipore, Billerica, MA, USA).

Sample collection statistics of patients and controls

This study was approved by the ethical committee of the Jinnah Postgraduate Medical Center (JPMC) and written informed consent was obtained from all the participants. A total of 384 plasma samples of healthy Non-Smokers (NS), Smokers (S) and Chronic Obstructive Pulmonary Disease (COPD) and Lung Cancer (LC) patients were included in this study. 96 samples from each group in the age range of 30–65 years among S and NS, while 35–70 years in the case of COPD and LC patients were selected. Cancer subjects included in this study were of pathologically proven LC of common subtypes, including 10 Squamous Cell Lung Cancer (SqLC), 12 Adenocarcinoma Lung Cancer (AdLC), 16 Small Cell Lung Cancer (SmLC), 10 Non Small Cell Lung Cancer (NSCLC) and 52 were uncategorized Lung Cancer (type of lung cancer were not diagnosed). The smokers included in this study had been smoking for at least 10 years or more.

Blood samples of male and female were collected from the JPMC Karachi, Pakistan, after consent. About 8 mL of the blood was drawn in the morning from the overnight fasting volunteers in BD Vacutainer tubes (BD Franklin Lakes, NJ, USA, REF 367856), containing K2-ethylenediaminetetraacetic acid as an anticoagulant. Plasma was separated immediately by centrifugation at 4,500 rpm for 10 min at 4°C. Finally, the plasma was aliquoted and frozen at −80°C. A code was given to each sample. Sample collection description and codes are mentioned in Table 1&2.

Table 1 Experimental subject description - healthy non-smokers and smokers
Table 2 Experimental subject description- lung cancer patients

Sample preparation

Method was carried out in accordance with our previous protocol26 with some modification. Samples were processed in a 96-well plate, in each plate aliquots of 100 μL of plasma of each samples were mixed with 800 μL of solvent methanol, 20 μL of internal standard myristic-d27 acid (1 mg/mL stock solution) was added and left on ice for 30 minutes. The precipitated proteins were then removed by centrifugation at 12,000 rpm for 10 min (Eppendorf Centrifuge 5804 C/R). Aliquots (600 μL) of the resulting clear supernatants were loaded onto the C18 96-well plate (Strata C18-E, 55 μm pore size, 70°A particle, 100 mg sorbent/well Phenomenex, USA) and drawn through the solid phase under vacuum. Prior to extraction, the phase was activated with 2 × 300 μL of methanol and then further conditioned with 2 × 300 μL of water. After loading of sample on plate, the phase was washed with 2 × 200 μL of water and eluted with 600 μL of methanol. The eluates were collected in 96-well collection plates. The eluate was then evaporated under N2 at room temperature. The dry samples were stored at 4°C until analysis. The SPE extractions were performed on solid phase extraction vacuum manifold AH0-7502 Phenomenex (USA).

Derivatization and GC-MS analysis

The dried extract of all the samples were derivatized subsequently by adding 50 μL methoxylamine hydrochloride in pyridine (15 μg/μL), vortexed and left for 2 hr at 35°C. Then BSTFA was added with 1% trimethylchlorosilane (TCMS) and placed at 70°Cfor 60 min to form trimethylsilyl (TMS) derivatives. GC-MS parameters were same as those reported in our previous paper26. GC-MS analysis was performed using 7890A gas chromatography (Agilent technologies, USA), equipped with an Agilent Technology GC sampler 120 (PAL LHX-AG12) autosampler and coupled to a Agilent 7000 Triple Quad system (Agilent technologies, USA) and HP-5MS 30 m–250 mm (i.d.) fused-silica capillary column (Agilent J&W Scientific, Folsom, CA, USA), chemically bonded with a 5% diphenyl 95% dimethylpolysiloxane cross-linked stationary phase (0.25 mm film thickness) according to our previous report26.

GC-MS data preprocessing and statistical analysis

Metabolite profiling of blood samples were analyzed using the optimized GC-MS assay. Data processing was performed using the Agilent Mass Hunter Qualitative Analysis (version B.04.00). Peak integration and deconvolution (parameter were same as previously reported except SNR threshold 3.026 were performed on Mass Hunter. Putative identification of low molecular weight metabolites were established by comparing the mass spectra of the peaks with those available in the NIST mass spectral (Wiley registry NIST 11) and Fiehn RTL libraries. The identification of peaks was based on 70% similarity index. All the GC-MS spectra were exported as CEF format and uploaded on MPP for peak alignment, normalization, significance testing, fold change and multivariate analysis for both identified and unidentified compounds.

All the available data (full scan mode from m/z 50 to 650 and retention time window 6.5 to 35 min) and minimum absolute abundance of 5,000 counts were used to filter the data. Alignment parameter was set as retention time tolerance 0.05, match factor 0.3 and delta MZ 0.2. Data was normalized to unit scale. After the normalization of data, baseline differences in metabolism between the four groups were eliminated. For baseline correction, all the compounds treated equally regardless of their intensity. It subtracts the mean abundance of each entity from the corresponding values in each sample. A total of 1,877 entities were found in the entire samples after alignment. Entities were filtered by frequency (those which appeared in more than 50% of samples in at least one group of samples were chosen), p ≤ 0.001, fold change> 3 and coefficient of variance (CV) < 25%. Statistical significance analysis using the one way ANOVA and a level of probability of 0.001 was used as the criterion for significance. 32 Entities were found to be significantly different in lung cancer and controls. Turkey's honest Significance Difference (HSD) post Hoc test was then applied to identify which entities were responsible for significant differences in the four groups. Hierarchical clustering was performed by applying Pearson's uncentered-absolute distance metric, complete linkage. Class prediction was built using a PLSDA model. PLSDA was constructed using 32 entities of filtered data using four components including auto scaling, N fold validation type, three numbers of fold and with ten numbers of repeats. Sensitivity and specificity were also measured from the construct model. 40 Samples were randomly selected and validated through the constructed model.

Results and discussion

Metabolite profiling of a total 384 plasma samples from healthy non-smokers, smokers, COPD and lung cancer patients (96 samples of each group) were analyzed by using GC-MS. 2D-C18 sample preparation method was used for the enrichment of metabolites based on our previous findings26. Data files were subjected to extensive statistical analysis using MPP software in order to identify the comparative and statistically distinguished metabolites for the search of lung cancer biomarkers.

Significance testing and fold change

The purpose of significant testing and fold change is to identify statistically differentiative metabolites by applying appropriate test and conditions. Thirty two metabolites, out of 1,877 were found to be significantly different among the three controls (NS, S and COPD) and lung cancer using one way ANOVA and a level of probability of 0.001 and fold change > 3 (Table 3). Eleven metabolites i.e. lactic acid (CAS # 79-33-4), phosphoric acid (CAS # 7664-38-2), benzoic acid (CAS # 2078-12-8), naphthalene (CAS # 29422-13-7), d-glucose (CAS # 128705-73-7), altrose (CAS # 1990-29-0), palmitic acid (CAS # 64519-82-0), octadecanoic acid (CAS # 1188-75-6), stearic acid (CAS # 57-11-4), 1-propene (CAS # 1000154-23-3) and cholesterol (CAS # 1856-05-9), out of 32 low molecular weight metabolites were putatively identified (level 2 of Metabolomics Standard Initiative for the identification) by comparing the mass spectra of the peaks with those available in the NIST mass spectral (Wiley registry NIST 11) and Fiehn RTL libraries at 70% similarity index (Table 3), while the remaining were not identified at this similarity index (Table 3). The EI/MS spectra of unidentified compounds are shown in supplementary information (Fig. S1).

Table 3 List of metabolites (32 entities) that are distinguished between three controls, healthy non-smokers (NS), smokers (S), chronic obstructive pulmonary disease (COPD) and lung cancer (LC) at p < 0.001 and fold change >3 and CV < 25%

After the completion of ANOVA, Turkey's honest significant difference (HSD) post Hoc test was applied in order to find out which entities or metabolites were significantly expressed among controls and lung cancer. It was found that a large number of metabolites were significantly different in lung cancer and the three control groups. 31 in COPD, 30 in smoker and 27 metabolites in healthy were significantly expressed, as compared to lung cancer. Only five metabolites were statistically different in smoker and COPD, showing the close resemblance between these two groups. 11 and 12 metabolites in healthy groups were statistically significant, as compared to COPD and smoker, respectively. Turkey's honest significant difference (HSD) post Hoc test summary is shown in supplementary information (Table S1) while identities of statistically significant metabolites which were differing in the four groups are also provided in supplementary information (Table S2). Venn diagram shows the overlapping of statistically differentiative metabolites between controls and lung cancer. In comparison of lung cancer with smoker and COPD, no peaks were overlapped in all the samples. 27 out of 32 were overlapped in smokers and COPD showing their close resemblance. However, 29 peaks were unique in lung cancer group which created differences between lung cancer and controls, while only 1 and 2 peaks were overlapped between lung cancer with COPD and smokers, respectively (Fig. 1A). In contrast, comparison of lung cancer with smokers and healthy non-smokers showed only 1 overlap peak in all samples, while 20 peaks were overlapped in healthy non-smokers and smokers. In this comparison, 24 peaks were unique to lung cancer which created a difference between lung cancer and controls, while only 2 and 5 peaks were overlapped between the lung cancer with smokers and healthy non-smokers, respectively (Fig. 1B).

Figure 1
figure 1

Venn diagrams highlighting the overlapping of statistically differentiative metabolites observed (A) among smoker, COPD and lung cancer patients, (B) among healthy non-smokers, smokers and lung cancer patients samples by applying Turkey's honest significance difference HSD post Hoc test.

Clustering

Cluster analysis is a powerful method to organize either entities (compounds) or groups of samples into clusters, based on the similarity of their profiles. Hierarchical clustering was performed to produce a dendrogram for clustering of samples groups using normalized intensities of thirty two significance metabolites (Fig. 2). The length of the vertical lines in the dendrogram is a measure of dissimilarity, while shorter lines demonstrate close relationship of the groups. This approach clustered the four groups (three controls and lung cancer group) into classes I, II and III (Fig. 2). The two groups, i.e. lung cancer (LC) and COPD clustered together in class I with dissimilarity level of only 0.206 (Fig. 2). In class II, three groups, i.e. LC, COPD and smokers (S) were at dissimilarity level of 0.461 (Fig. 2). Clustering of all the four groups in class III showing dissimilarity level of 0.924 (Fig. 2) indicated that healthy non-smokers (NS) are most dissimilar from among the three groups, i.e. S, COPD and LC. Almost all the LC and COPD patients possess smoking background which results in close relationship of the three groups. An image of heat map using non-average samples (visualizing all samples) with normalized intensities of thirty two significant metabolites is shown in Fig. 3. From this figure, it is clear that lung cancer profile is totally different from three controls by considering all the samples of each group. There is also good reproducibility in each group and mostly the significantly differentiative metabolites are highly expressed in lung cancer as compare to control ones. Each histological subgroup of lung cancer was also compared with control groups (Fig. S3 of supplementary material). Squamous cell carcinoma and small cell carcinoma of lung cancer are strongly related with smoking habit and this is also supporting in our clustering analysis of significance metabolites in Fig. S3(A and B) while adenocarcinoma of lung cancer were not clustered with smokers, as adenocarcinoma is the most common form of lung cancer among people who have often or never smoked in their lifetimes Fig. S3C. Non-small cell lung cancer were also not clustered with smokers, this may be due to most of the samples in this class have adenocarcinoma (a type of non-small cell) Fig. S3D.

Figure 2
figure 2

Comparison of four groups of samples i.e. healthy non-smokers (NS), smokers (S), Chronic Obstructive Pulmonary Disease (COPD) and Lung Cancer (LC) patients using normalized intensities of thirty two significance metabolites.

The dendrogram was produced by applying a hierarchical clustering algorithm (Pearson's uncentered-absolute distance metric, Complete Linkage).

Figure 3
figure 3

Heat map of all analyzed samples with normalized intensities of thirty two statistically significance metabolites.

Identified compounds are labeled by their name while unidentified compounds are labeled by their retention time (RT).

Class prediction model and test

A model was built using thirty two statistically significant metabolites. Partial Least Square Discrimination (PLSD) algorithm was used to classify samples into discrete classes. The classes in the input data are randomly divided into three equal parts; two parts were used for training and the remaining part was used for testing. The process was repeated ten times with a different part that is used for testing in each iteration. Thus each row is used at least once in training and testing and a Confusion Matrix is generated. The results of Confusion Matrix (a matrix which gives the accuracy of prediction of each class) are presented in supplementary information Table S3. Figure 4 shows the plots obtained by PLS-DA scores. A clear separation trend was observed between the three controls involving healthy non-smokers, smokers and COPD with lung cancer samples in the PLS-DA scores plot (Fig. 4). The smokers and COPD lies close to each other as there are 27 entities were common between them (Fig. 1A). The lung cancer group was totally different from the controls groups as there were at least 24 entities significantly different from the controls in the lung cancer group (Figure 1) and this is also seen in the heat map (Fig. 3). Sensitivity and specificity are also measured from the constructed model. Sensitivity was calculated from the ratio of true positives (cancer samples which correctly predicted) to the total number of subjected cancer samples, whereas specificity was determined from the ratio of true negatives (control samples which correctly predicted) to the total number of subjected control samples. Sensitivity and specificity was found to be 96.2% and 92.0%, respectively and overall accuracy of the model was found to be 93.1%. External validation measures the predictive capability (sensitivity and specificity) of a calculated model. The model was used to externally validate an independent or blind-test set of 38 plasma samples (8 healthy non-smokers, 10 smokers, 10 COPD and 10 lung cancer patients). PLSDA classifier correctly predicted the presence of LC in 10 out of 10 patients, healthy non-smokers in 8 out of 8, COPD 9 out of 10 and smokers 5 out of 10 resulting with 100% sensitivity and 78.6% specificity. 50% of the smokers were incorrectly predicted by the model as COPD, may be due to the common smoking history of both. All the sample prediction reports are shown in Figure S2 of supporting information.

Figure 4
figure 4

PLS-DA Scores scatter plots discriminating among controls and lung cancer patients based on the thirty two significantly differentiate metabolite profiling data.

The red, blue, brown and gray squares indicate healthy volunteers (n = 54), smokers (n = 66), COPD (n = 75) and lung cancer patients (n = 52), respectively.

Pathway analysis

Pathway analysis was done through MPP software using thirty two significantly differentiative metabolites which reveals disturbance in several pathways including pyruvate metabolism and citric acid (TCA) cycle, fatty acid triacylglycerol and ketone body metabolism, bile acid and bile salt metabolism, ATP Binding Cassette (ABC) family protein mediated transport and G-Protein Coupled Receptor (GPCR) downstream signaling pathways.

Pyruvate metabolism and citric acid (tca) cycle

All cells in our bodies require oxygen and nutrients. Energy is constantly needed to perform cellular functions. For the proliferation of cells, nutrients are needed in abundance for rapid growth. Therefore, cancer cells require a plentiful supply of nutrients. Most cancer cells are highly dependent on glucose for energy. Our experimental data showed that the level of glucose was different between lung cancer and control plasma samples. High levels of glucose were found in the plasma samples of lung cancer, as compared to controls. Warburg reported the conversion of glucose to lactic acid in the presence of oxygen as a specific metabolic abnormality of cancer cells27(Mishra and Verma, 2010). High level of lactic acid was also found in the plasma samples of lung cancer. High level of glucose in lung cancer does not show the decrease in glycolysis as lactic acid is also up-regulated in lung cancer. Glycolysis results in the breakdown of glucose, but several reactions in the glycolysis pathway are reversible and participate in the re-synthesis of glucose, so gluconeogenesis may be responsible for the increased levels of glucose in lung cancer. Pathway analysis through MPP shows the alteration or disturbance in lactic acid, carbon dioxide and phosphoric acid involved in pyruvate metabolism and citric acid (TCA) cycle between controls and lung cancer. This is shown in Fig. S4 of supplementary material.

Fatty acid triacylglycerol and ketone body metabolism

Alterations of several lipids metabolism are often observed in lung cancer samples, including over-expression of fatty acid synthase (FAS). Comparatively high levels of fatty acids, including palmitic acid, octadecanoic acid, stearic acid and cholesterol were found in the plasma samples of lung cancer as compared to controls. FAS serves to store the energy derived from carbohydrate metabolism. Fatty acids are esterified to phospholipids, such as phophatidylcholine28. They are activated to acyl-CoA in a 2-step reaction, forming diacylglycerides with glycerol 3-phosphate. These diacylglycerides then react with CDP choline to form phosphatidylcholine. Pathway analysis through MPP shows the alteration in phosphoric acid, palmitate, carbon dioxide, glycerol and archidonic acid involved in fatty acid triacylglycerol and ketone body metabolism between controls and lung cancer as shown in Fig. S5 of supplementary material. Over expression of FAS has been observed in many lung cancers studies10,11,29. Experimental studies have indicated that various oncogenic signaling pathways lead to increased FAS expression30,31. Recently SREBP (Sterol Regulatory Element-Binding Protein, a transcription factor and is a direct target of PI3K/Akt and MAPK pathways) that regulates the lipid synthesis and uptake through up-regulation of key enzymes of lipogenesis32,33. High content of glucose may be due to the high requirement of energy of lung cancer cells which results in carbohydrate metabolism and lipogenesis to provide the energy in the form of glucose.

GPCR downstream signaling

In cancer cells (lung, gastric, colorectal, pancreatic and prostatic cancers) abnormal expression of GPCRs and/or their ligands has been observed34,35. Pathway analysis shows increase in phosphoric acid, glycerol and arachidonic acid levels in lung cancer, involved in GPCR downstream signaling pathway derived from endocannabinoids anandamide (AEA) and 2-arachidonoyl glycerol (2-AG). The resulting altered pattern of receptor expression is shown in Fig. S6 of supplementary material. This consequently leads to changes in fatty acid synthesis and glucose utilization36.

ABC family protein mediated transport

ABC transporters are membrane proteins which generate energy from ATP hydrolysis to actively transport a variety of compounds across the membrane, including ions, sugars, amino acids, lipids, toxins and anticancer drugs. ABC transporters are involved in tumor resistance. ABCB1 or MDR1 P-glycoprotein are involved in lipid transport which is their main function37. Pathway analysis shows the alteration of phosphoric acid and cholesterol involved in ABC family protein mediated transport, as shown in Fig. S7 of the supplementary material.

Bile acid and bile salt metabolism

Bile acids are steroidal amphipathic molecules, derived from the catabolism of cholesterol. The catabolism of cholesterol to bile acids is an important route for the elimination of cholesterol from the body, accounting for approximately 50% of cholesterol eliminated daily. Bile acids are involved in signal transduction pathways that regulate apoptosis38. Pathway analysis shows the alternation of phosphoric acid and cholesterol, involved in bile acid and bile salt metabolism, as shown in Fig. S8 of the supplementary material.

Up-regulation of acidic environment (decrease pH) in cancer cells is common due to production of lactic acid. Our experimental data shows high level of lactic acid, phosphoric acid and benzoic acid in lung cancer patients, as compared to controls. Acidic environment of cancer typically results in necrosis or apoptosis through p53 and caspase-3-dependent mechanisms39. Consequently, up-regulation of glycolysis requires resistance to apoptosis or up-regulation of membrane transporters to maintain pH. These changes may result in a malignant phenotype and facilitate local invasion and metastasis formation39.

Concluding remarks

Our study has shown that GC-MS-based metabolite profiling of blood plasma using 2D-C18 fractionation approach followed by chemometric analyes is able to identify biomarker metabolites which can significantly differentiate lung cancer from three control groups (healthy non-smokers, smokers and COPD) with high sensitivity (96.2%) and specificity (92.05%). The two groups, i.e. lung cancer (LC) and COPD are much close to each other (dissimilarity level of only 0.206 by cluster analysis). Elevated levels of almost all the fatty acids, glucose and acids were found in lung cancer patients, in comparison to the controls. Generally, glycolysis increased in cancer but in this study high level of glucose was found in lung cancer samples as compare to controls. However, high level of glucose in lung cancer does not show the decrease in glycolysis as lactic acid is also up-regulated in lung cancer. From the pathway analysis, it was concluding that glycolysis results in the breakdown of glucose, but several reactions may be responsible for the increased levels of glucose in lung cancer like gluconeogenesis, carbohydrate metabolism and lipogenesis to provide the energy in the form of glucose. Up regulation of acidic environment (decrease pH) and alterations of several lipid metabolism favors the lung cancer growth. A promising finding is the newly built model based on thirty two significantly metabolites which accurately classifies lung cancer and controls on external validation. Unfortunately, only 37% of the metabolites were characterized and their pathways are correlated. Identification of unknown metabolites with high resolution can increase human metabolome and ultimately help in biomarker identification of lung cancer.