Introduction

Lung cancer is one of the leading causes of cancer-related death in the world1. Non-small cell lung cancer and small cell lung cancer are two major pathological types of lung cancer. Unfortunately, many patients are diagnosed with advanced lung cancer due to the asymptomatic nature of the early stages and a lack of effective screening modalities, resulting in a very low 5-year survival rate. Despite the development of multimodal treatment strategies in past decades, including surgical resection, chemotherapy, and radiation therapy, the outcomes of lung cancer patients remain unsatisfactory2. Therefore, novel biomarkers for diagnosis, prognosis, and drug response are urgently needed.

Gene expression profiles have been shown to provide diagnostic or prognostic information in a variety of cancers3,4,5,6. Yang et al.7 demonstrated that MARCKS contributed to constitutive CAF activation in ovarian cancer, and MARCKS overexpression defined a poor prognosis in ovarian cancer patients. Sun et al.8 investigated the prognostic potential of lncRNAs in diffuse large-B-cell lymphoma (DLBCL), and identified a potential panel of six-lncRNA signature as a composite biomarker for risk stratification of DLBCL patients at diagnosis. However, efforts to translate gene expression- based analytical methods into the clinical application have been met by several obstacles, including a lack of independent validation or inclusion of clinical variables, as well as overall tumor heterogeneity9. To overcome these hurdles, our investigation utilized a large number of patients from multiple studies with diverse patient populations.

In the present study, we identified differentially expressed genes that were common among several expression profiles. We selected the target genes from among the 100 differentially expressed genes based on biology. According to the literature, NIMA-related kinase 2 (NEK2), disc large (drosophila) homolog-associated protein 5 (DLGAP5) and epithelial cell transforming 2 (ECT2) are three specific mitosis-associated genes. In this study, CCNB1, CCNB2, CDKN2A, BUB1, BUB1B and TTK were also involved in cell cycle. Deregulated gene expression of mitosis-related factors, which forces chromosomal segregation during cell division, is frequently observed in cancer. The results of high throughput screening were confirmed by qRT-PCR and further validated in the TCGA datasets. The expression levels of NEK2, DLGAP5 and ECT2 were significantly higher in lung cancer patients than in normal subjects. In addition, we explored and discussed the diagnostic and prognostic value of the three genes in lung cancer. ROC analyses showed that NEK2, DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values. Elevated expression of NEK2, DLGAP5 and ECT2 were both remarkably associated with reduced survival and increased risk of recurrence. Taken together, our findings revealed that NEK2, DLGAP5 and ECT2 might be used as promising biomarkers for the early detection of lung cancer, as well as predicting the prognosis of lung cancer patients.

Results

Identification of DEGs between tumor tissues and normal lung tissues

In our study, three expression profiles (GSE19188, GSE18842, GSE40791) were used to identify DEGs between tumors and normal lung tissues. Genes with corrected P-values <0.05 and absolute fold changes >4 were considered as DEGs. The results showed that 131 genes were up-regulated in GSE19188, 316 genes were up-regulated in GSE18842, and 309 genes were up-regulated in GSE40791 (Figure S1A–C). Then, we performed an overlap analysis of the DEGs, a total of 100 genes were significantly up-regulated in the three lung cancer datasets (Figure S1D, Table S2). The increased expression of NEK2, DLGAP5 and ECT2 in lung cancer was identified in three GEO datasets. An unpaired t-test was applied to comparisons of the two groups (tumor vs normal), and p-values of less than 0.05 were considered to be statistically significant (Fig. 1A–C). Importantly, these three genes play an important role in mitosis. Thus, in this study, we focused on NEK2, DLGAP5 and ECT2, three critical mitotic genes.

Figure 1
figure 1

Identification of the differentially expressed genes. (A) Identification of mRNA expression of NEK2 in three datasets, respectively. (B) Identification of mRNA expression of DLGAP5 in three datasets, respectively. (C) Identification of mRNA expression of ECT2 in three datasets, respectively. ***corresponds to P < 0.001; **P < 0.01 and *P < 0.05.

Independent validation

To confirm our previous results, we selected a series of DEGs for further investigation using another independent set of 56 paired tumors and normal lung tissues. The clinical characteristics of this cohort are summarized in Table 1. NEK2, DLGAP5 and ECT2 expression levels were significantly elevated in tumor tissues compared with normal lung tissues (Fig. 2A–C). As our study was limited to a small number of patients, we expanded the sample size for further validation by using TCGA datasets. A total of 349 lung cancer and 58 normal tissue samples were selected. The expression levels of NEK2, DLGAP5 and ECT2 were similar to those in our training cohort, with significant differences in expression between tumor and normal (Fig. 3A,C,E), suggesting that the differential expression statuses of these three genes is a common feature of lung cancer. Moreover, the increases in NEK2, DLGAP5 and ECT2 expression levels were clearly discernible between TNM stages, with significantly higher levels in stage II-IV patients compared with stage I patients. (Fig. 3B,D,F).

Table 1 Clinicopathological characteristics of patients for clinical validation cohorts.
Figure 2
figure 2

Clinical validation of the selected genes in paired tumor and normal tissues using qRT-PCR. (A) NEK2 (B) DLGAP5 (C) ECT2 ***corresponds to P < 0.001; **P < 0.01 and *P < 0.05.

Figure 3
figure 3

Validation of the selected genes using 349 lung cancer and 58 normal tissues from TCGA datasets. (A) Validation of mRNA expression of NEK2 in TCGA datasets. (B) Gene expression of NEK2 in lung cancer patients according to clinical stage. (C) Validation of mRNA expression of DLGAP5 in TCGA datasets. (D) Gene expression of DLGAP5 in lung cancer patients according to clinical stage. (E) Validation of mRNA expression of ECT2 in TCGA datasets. (F) Gene expression of ECT2zE in lung cancer patients according to clinical stage. ***Corresponds to P < 0.001; **P < 0.01 and *P < 0.05.

Correlation between the three biomarkers and clinicopathologic variables

Next, the analysis of the associations between DEG expression and clinicopathological characteristics are presented in Table 2. The TCGA dataset was used for correlation analyses. NEK2 expression was significantly associated with age (P = 0.027), gender (P < 0.001), clinical stage (P = 0.033), pathologic T stage (P < 0.001) and therapy outcome (P = 0.004). Elevated DLGAP5 expression was significantly correlated with all six clinicopathologic variables. No significant association was observed between ECT2 expression and patient age or clinical stage. Table 2 shows the significant associations between high ECT2 expression in lung cancer and gender (P = 0.002), new tumor event (P = 0.026), pathologic T stage (P = 0.002), and therapeutic outcome (P = 0.012). These results suggest that expression changes in NEK2, DLGAP5 and ECT2 may play a vital role in lung cancer progression.

Table 2 Correlation between NEK2/ DLGAP5/ECT2 expression and clinical characteristics in 349 lung cancer patients.

Diagnostic value of NEK2, DLGAP5 and ECT2 in lung cancer

Subsequently, ROC analysis was performed to assess the diagnostic value of NEK2, DLGAP5 and ECT2 as biomarkers detecting lung cancer. The AUC of tumor and normal groups in NEK2 analyses were significantly different for all four lung cancer datasets, with the following values: AUCGSE19188 = 0.927 (sensitivity: 0.923, specificity: 0.890), AUCGSE18842 = 1 (sensitivity: 1, specificity: 1), AUCGSE40791 = 0.967 (sensitivity: 0.910, specificity: 0.926) and AUC TCGA = 0.977 (sensitivity: 0.983, specificity: 0.873) (Fig. 4A, Table 3). Similarly, ROC analyses showed that DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values (Fig. 4B–C, Table 3). Furthermore, in order to exclude the influence of primary clinical factors (age, gender, clinical stage, smoking history) on target gene performance, we further constructed prediction models including (Model 1) or excluding (Model 2) the target gene. Model 1 includes clinical factors and the target gene. Model 2 includes only clinical factors, and excludes the target gene. We compared these models, and the results of these comparisons are shown in Table S3 and Fig. 4D–F. Model 2 performed worse than Model 1. These results suggest that these target genes are important factors for maintaining the model’s performance. Collectively, our results suggest that NEK2, DLGAP5 and ECT2 could be suitable biomarkers for lung cancer diagnosis.

Figure 4
figure 4

Diagnostic value of the three candidate genes in lung cancer by ROC curves analysis. (A) NEK2 (B) DLGAP5 (C) ECT2. The four datasets are marked in the figures. The red line is GSE19188, the blue line is GSE18842, the green line is GSE40791, and the black line is TCGA datasets. For the ROC curve, the comparisions between model with the target gene (Model 1) and the model without the target gene (Model 2) was performed. The Model 1 (red line) includes age, gender, smoking status, clinical stage and the target gene. The Model 2 (green line) without the target gene. (D) NEK2. AUCModel 1 = 0.971 (P-value <0.001), AUC Model 2 = 0.556 (P-value = 0.231). (E) DLGAP5. AUCModel 1 = 0.977 (P-value <0.001), AUC Model 2 = 0.556 (P-value = 0.231). (F) ECT2. AUCModel 1 = 0.968 (P-value <0.001), AUC Model 2 = 0.556 (P-value = 0.231).

Table 3 ROC curve analyses using NEK2/DLGAP5/ECT2 for distinguishing patients with lung cancer from normal control subjects.

Prognostic value of NEK2, DLGAP5 and ECT2 in lung cancer

Furthermore, in order to assess the prognostic value of NEK2, DLGAP5 and ECT2 as biomarkers for lung cancer, we investigated the association between the expression levels of each of these targets with survival through Kaplan-Meier analysis. We used the log-rank test in 349 lung cancer patients. The Cox proportional hazards regression model was also used to evaluate the predictive value of NEK2, DLGAP5 and ECT2 mRNA levels in lung cancer patients. Two types of survival outcomes were considered in survival analyses. Overall survival (OS) was defined as the time between the date of surgery and date of death or last follow-up, and relapse-free survival (RFS) was defined as the period from surgery to recurrence or last follow-up.

In this study, the TCGA dataset was used for prognostic analyses. We divided expression levels into two categories using the median. High expression levels were classified as those that were above the median, while low expression levels were below the median. On the whole, patients with low NEK2 levels had statistically longer OS (P = 0.009; Fig. 5A) and RFS (P = 0.006; Fig. 5B) than those with high NEK2 levels. The median OS in NEK2 low expression group is 72.5 months, in NEK2 high expression group is 39 months. The median RFS in NEK2 low expression group is 73.9 months, in NEK2 high expression group is 25.7 months. Similarly, DLGAP5 expression was significantly related with OS (P = 0.001; Fig. 5C) and RFS (P = 0.003; Fig. 5D) of lung cancer patients. The median OS in the low and high DLGAP5 expression groups is 59.7 months and 35.8 months, respectively. The median RFS in the low and high DLGAP5 expression groups is 68.2 months and 25.7 months, respectively. These figures revealed that higher DLGAP5 expression correlated with a worse prognosis and earlier recurrence. Elevated expression of ECT2 was also remarkably associated with reduced survival (P = 0.007; Fig. 5E) and increased risk of recurrence (P = 0.005; Fig. 5F). The median OS in low and high ECT2 expression groups is 59.7 months and 41.2 months, respectively. The median RFS in low and high ECT2 expression groups is 68.2 months and 25.7 months, respectively. Taken together, high expression of these three genes were all remarkably associated with reduced survival and increased risk of recurrence. The univariate/multivariate analyses were carried out to evaluate the target genes and other factors using a Cox proportional hazard regression model. The results showed that the expression of each target gene was significantly correlated with the prognosis of lung cancer patients (Table 4).

Figure 5
figure 5

Kaplan- Meier analysis of OS and RFS probabilities based on the expression levels of three candidate genes. (A,C,E) Survival curves of lung cancer patients according to the status of NEK2/DLGAP5/ECT2 expression levels. Patients with high NEK2/DLGAP5/ECT2 expression showed significantly poorer OS than those with low NEK2/DLGAP5/ECT2 expression (P = 0.009, P = 0.001; P = 0.007, respectively). (B,D,F) RFS of lung cancer patients according to the status of NEK2/DLGAP5/ECT2 expression levels. Patients with high NEK2/DLGAP5/ECT2 expression showed significantly poorer RFS than those with low NEK2/DLGAP5/ECT2 expression (P = 0.006, P = 0.003; P = 0.005, respectively).

Table 4 Univariate and multivariate Cox regression analyses for overall survival and recurrence-free survival.

Further subgroup analysis, stratified by clinicopathological features, were perfomed to explore the effects of NEK2 expression on OS and RFS in the patients. In patient groups characterized as female, age <65, stage T3 + T4, or in groups with new tumor events, there was no difference in OS between NEK2-low and NEK2-high patients. Meanwhile, in groups characterized as age ≥65, male, stage T1 + T2, patients with low NEK2 levels had statistically better OS than those with high NEK2 levels (P = 0.019, Figure S2A; P = 0.011, Figure S2B; P = 0.036, Figure S2C, respectively). Similarly, Kaplan-Meier analysis revealed that groups with high NEK2 levels had poor RFS, which was significantly associated with groups age ≥65 (P = 0.012, Figure S2D), male (P = 0.034, Figure S2E), and stage T1 + T2 (P = 0.004, Figure S2F). In groups characterized as age <65 (or ≥65), male, stage T3 + T4, the patients with low DLGAP5 levels had statistically better OS than those with high DLGAP5 levels (P = 0.035, P = 0.002, Figure S3A; P = 0.020, Figure S3B; P = 0.021, Figure S3C, respectively). Our results also showed that groups with high DLGAP5 levels had poor RFS, which was significantly associated with groups age ≥65 (P = 0.009, Figure S3D), female (P = 0.006, Figure S3E), and stage T1 + T2 (P = 0.038, Figure S3F). Kaplan-Meier analysis revealed that groups with low ECT2 levels had better OS, which was significantly associated with groups age <65 (P = 0.005, Figure S4A), male (P = 0.004, Figure S4B), and stage T3 + T4 (P = 0.023, Figure S4C). Similarly, low ECT2 levels had a better RFS which significantly associate with age <65 (P = 0.008, Figure S4D), male (P = 0.033, Figure S4E), and stage T1 + T2 (P = 0.041, Figure S4F).

Discussion

Lung cancer remains the most common cause of cancer related death worldwide1. The high mortality among patients with lung cancer is mainly due to the absence of an effective screening strategy to identify lung cancer in early stages10. Current screening strategies for lung cancer include conventional radiography, sputum cytology, and more recently, low-dose computed tomography (LDCT). LDCT screening can significantly improve early diagnosis and reduce lung cancer mortality. However, the false-positive rate is high for screening with LDCT and this can lead to harm due to unnecessary workups of benign nodules11, 12. For many decades, cytotoxic chemotherapy was the most effective treatment to improve overall survival and life quality in these patients, despite its many drawbacks13. At the same time, researchers made substantial efforts towards the development of molecular targeted agents14. Systematic clinical studies and basic research on lung cancer has improved the survival; however, the long-term outcomes of lung cancer patients remain poor. Thus, it is necessary to identify new biomarkers to improve the diagnosis and prognosis of lung cancer.

NEK2 is a serine/threonine kinase that is involved in regulation of centrosome duplication and spindle assembly during mitosis15, 16. Dysregulation of these processes causes chromosome instability (CIN) and aneuploidy, which are hallmark changes in many tumors17, 18. NEK2 exists in three alternative splice isoforms, which are NEK2A, NEK2B and NEK2C19. NEK2 overexpression has been observed in several human cancers. Increased expression of NEK2 has been reported to be involved in tumor progression and is associated with poor prognosis in pancreatic ductal adenocarcinoma20, prostate cancer21, colon cancer22. However, the association between the expression level of NEK2 and the early diagnosis of lung cancer patients remains to be rigorously and systematically evaluated. ECT2 is a BRCT-containing protein whose function has been best studied in cytokinesis. He et al.23 showed that ECT2 is located to the chromatin and DNA damage foci-like structures and it facilitates PIKK-mediated phosphorylation of p53 on Ser15, the execution of apoptosis, and the activation of S and G2/M checkpoints. Luo et al.24 showed that elevated expression of ECT2 predicts an unfavorable prognosis in patients with colorectal cancer. Another potential predictor of lung cancer diagnosis and prognosis is DLGAP5. DLGAP5 is a mitotic spindle protein that promotes the formation of tubulin polymers resulting in tubulin sheets around the end of the microtubules25. DLGAP5 contains a guanylate-kinase-associated protein (GKAP) domain that is conserved among various species. This domain is also found in many eukaryotic signaling proteins, suggesting that DLGAP5 may have important biological functions as a signaling molecule26. DLGAP5 is involved in cancer formation and progression, suggesting that the gene and its product may be potential therapeutic targets27.

NEK2, DLGAP5 and ECT2 are mitosis-associated genes that play an important role in tumorigenesis. At present, these genes have been reported to be involved in lung cancer development. Through clustering of a genome-scale co-expression network, lung adenocarcinoma modules were revealed; in few modules, the genes such as DLGAP5 and BIRC5 are present that play a crucial role in cell cycle progression28. Das et al.29 uncovered a novel role for Nek2 in promoting tumorigenesis by regulating an axis of metastasis and cell survival. Ect2 regulates rRNA synth-esis through a PKCi-Ect2-Rac1-NPM signaling axis that is required for lung tumorigenesis30. It is of great clinical significance to explore the early diagnosis and prognosis of these three genes. In previous studies, there are some studies on the association between gene overexpression and poor prognosis in lung cancer. Zhong et al.31 discovered that the patients with overexpressed NEK2, Mcm7 and Ki67 had a poorer overall survival time compared to those with low expression for all stages. Landi et al.32 showed that the very mitotic genes (NEK2 and TTK) known to be involved in cancer development are induced by smoking and affect survival. Schneider et al.33 found that the expression of the mitosis-associated genes AURKA, DLGAP5, TPX2, KIF11 and CKAP5 is associated with the prognosis of NSCLC patients. ECT2 overexpression may be a useful index for application of adjuvant therapy to lung cancer patients who are likely to have poor clinical outcome34, 35. However, some genes identified with prognostic implications in one cohort might be difficult to be verified in other cohorts. The high reliability and reproducibility of the microarray technology in identifying the target genes are also essential for its application in discovering the clinical biomarkers.

Microarray technology has substantially enhanced the search for biomarkers for cancer diagnosis and prognosis. In this study, we identified and validated the expression of NEK2, DLGAP5 and ECT2 in multiple lung cancer datasets, and the results showed that the expression levels of these three genes were significantly higher in lung cancer patients than in normal subjects. Importantly, the expression levels of the three candidate genes were significantly associated with clinicopathologic variables. Furthermore, we revealed the diagnostic and prognostic value of the candidate genes. These cancer biomarkers can be used for early detection, disease monitoring and risk assessment. However, there are some limitations in this study. We just examined the expression of the target genes in tissue samples. Because the ultimate goal of biomarker is specific, early and non-invasive diagnosis and post-therapy monitoring of cancer, body fluid (plasma, urine and sputum) has been thought as an appropriate biological material. In the future, we will also detect the expression of these biomarkers in body fluid samples.

Taken together, these findings indicate that NEK2, DLGAP5 and ECT2 overexpression might be used as promising biomarkers for the diagnosis and prognosis of lung cancer. These genes may also serve as potential therapeutic targets in lung cancer. More work is needed to elucidate the function of these three candidate genes and their roles in tumorigenesis.

Materials and Methods

Patients and tissue samples

Fifty-six patients from Xiangya Hospital (Changsha, China) were included in this study. All the patients provided written informed consent. Experiments and procedures were performed in accordance with the Helsinki Declaration of 1975; and were approved by the Ethics Committee of Xiangya School of Medicine, Central South University. Tumor and matched distant (>5 cm) normal lung tissue samples were collected from NSCLC patients who underwent resection for primary lung cancer. All fresh tissues were frozen in liquid nitrogen immediately after resection and stored at −80 °C. Their basic clinical characteristics were summarized in Table 1.

Lung cancer gene expression datasets

Three lung cancer datasets (GSE19188, GSE18842, GSE40791) generated from the Affymetrix platform and corresponding clinical information of lung cancer patients were retrieved from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). GSE19188, including 91 tumors and 65 adjacent normal lung tissues, GSE18842, which includes 46 tumors and 45 controls, and GSE40791 containing 94 tumors and 100 non-tumor tissues.

Validation datasets were acquired from the Cancer Genome Atlas (TCGA) data portal (http://tcga-data.nci. nih.gov). This data set contains 349 adenocarcinomas and 58 non-tumor tissues with both mRNA expression data and clinical feature information available for performing the Receiver Operating Curves (ROC) analysis, survival analysis and correlation analysis. The aim of this study was to identify promising biomarkers for the early detection of lung cancer and to evaluate the prognosis of lung cancer patients. The latest version of the TCGA LUAD dataset includes 571 samples (513 tumors and 58 normal tissues). Two recurrent tumor samples were removed, 28 samples lacking OS data were removed, 133 samples lacking RFS data were removed, and 1 sample lacking clinical stage data was removed, and finally retained the 349 adenocarcinoma samples (primary tumor) and 58 non-tumor samples. Detailed clinical information of patients used in this study was shown in Table 2.

mRNA expression profiling using microarrays

Raw microarray data files (.CEL files) of the three datasets were analyzed using the Robust Multichip Average (RMA) algorithm by the R package Affy36. After that, the Linear Models for Microarray Data (LIMMA) package in R was used to calculate the probability of probes being differentially expressed between cases and controls37. P value correction was performed using the Benjamini-Hochberg (BH) FDR from the package in R. Corrected P-values <0.05 and absolute fold changes >4 were used to identify significantly DEGs. All data analysis were performed using R (http://www.r-project.org/, version 2.15.0) and Bioconductor38. Visualization of the DEGs including heat map, volcano plot and venn diagram was achieved by using gplots, lattice, and venn diagram packages in R, respectively.

Quantitative reverse transcription-polymerase chain reaction (qRT-PCR)

Total RNA was extracted from samples with Trizol reagent (Takara, Dalian, China) and then reverse transcribed to cDNA using PrimeScriptTM RT-PCR Kit (Takara, Dalian, China) following the manufacturer’s instructions. Real-time PCR was performed using SYBR® Premix DimerEraser™ (Perfect Real Time) (Takara, Dalian, China) in Roche LightCycler 480 II Real-Time PCR system (Roche Diagnostics Ltd., Rotkreuz, Switzerland). Primers used for real-time PCR are shown in Supplementary Table 1. The threshold cycle value (Ct) of each product was determined and normalized against that of the internal control GAPDH. The differences in mRNA expression levels were compared by t test using SPSS 18.0 (SPSS Inc, Chicago, Illinois, USA). P-values of less than 0.05 were considered statistically significant.

Statistical analysis

The SPSS version 18.0 (Chicago, IL) and Prism 5.0 GraphPad software (San Diego, CA) were used for statistical analysis. Student’s t-test was applied for comparisons of two groups. ROC curves were used to assess the diagnostic value of each marker39. Area under the curve (AUC) was computed for each ROC curve, and 95% confidence intervals (CI) were also estimated by bootstrapping with 1,000 iterations. Survival analysis was carried out according to Kaplan–Meier analysis and the Log-rank test. The Cox proportional hazards regression model was applied to perform univariate and multivariate analyses. P-values of less than 0.05 were considered statistically significant.