Introduction

Lung cancer exhibits the highest global morbidity and mortality rates [1]. Lung cancer is classified histologically as small cell (SCLC) or nonsmall cell lung cancer (NSCLC). NSCLC comprises 80–85% of all lung cancer cases, including adenocarcinoma, squamous cell carcinoma, and more rarely, large cell lung cancer [2]. Lung adenocarcinoma generally occurs in female and nonsmoking patients, and these cases typically involve drivers such as mutations in the EGFR, KRAS, and translocation of EML4-ALK genes. Squamous cell carcinoma generally occurs in male and smoking patients. However the driver genes involved remain unclear [3,4,5,6]. Efficacious targeted therapy was achieved in lung adenocarcinoma via the specific targeting of driver genes, such as EGFR and ALK. However, similar success remains to be achieved in lung squamous cell carcinoma [7].

Long noncoding RNAs (lncRNAs) are RNA molecules that have a length greater than 200 bases that do not encode any protein. A total of 4–9% of mammalian gene sequences produce lncRNA transcripts that participate in various important regulatory processes and are closely associated with the onset and progression of various diseases. More than 150 gene sites increase the susceptibility to cancer [3], and most of these sites are located beyond the protein coding sequence. Sites such as enhancers or promoters can significantly influence gene transcription and protein expression [8,9,10]. Therefore, it is important to identify genes relevant to cancer and investigate their interactions with protein-encoding genes to increase our understanding of cancer pathogenesis.

The most intensively studied lncRNA is HOX transcript antisense intergenic RNA (HOTAIR), which encompasses 2158 nucleotides on chromosome 12qs13.12. HOTAIR is transcribed from the antisense strand of the HOXC gene cluster, and it is a modular scaffold for two histone modification complexes [11]. Its 5′ structural domain combines with polycomb repressive complex 2 (PRC2), and its 3′ structural domain may be a scaffold for histone demethylated complex (a lysine-specific histone demethylase 1 (LsD1). HOTAIR mediates epigenetically silenced target genes by binding these two complexes to specific gene sites [12], and it is a cancer-related gene. HOTAIR is associated with genetic susceptibility to various tumors, including esophageal cancer [13], gastric cancer [14], neuroglioma [15], and breast cancer [16] in different populations. Numerous studies verified that the expression of HOTAIR lncRNA is upregulated in lung cancer and participates in invasion and migration [17]. HOTAIR is closely associated with all TNM stages of lung cancer, including lymph node metastasis and poor survival prognosis, which indicates its potential as a novel biological marker for the diagnosis and monitoring of cancer onset and progression [18]. For instance, a single nucleotide polymorphism (SNP) located in intron 2 of the HOTAIR gene [cytosine to thymine (C > T) (rs920778)] was identified as a novel intronic HOTAIR enhancer that exhibits a specific genotypic effect on HOTAIR expression. Molecular epidemiological studies investigated the association between the HOTAIR rs920778 polymorphism and the risk of cancer, including esophageal squamous cell carcinoma (ESCC), breast cancer (BC), and gastric cancer (GC). However, the influence of HOTAIR SNPs on lung cancer has not yet been investigated.

To assess the influence of HOTAIR SNPs on genetic susceptibility to lung cancer, four SNP sites were selected: rs920778 C > T, rs12826786 C > T, rs4759314 A > G, and rs1899663 G > T. Compared with the CC genotype at the rs920778 site in the HOTAIR enhancer, the TT genotype increases the risk of esophageal squamous cell carcinoma [13]. The T allele at the rs12826786 site of the HOTAIR promoter increases the risk of gastric cardia adenocarcinoma. Genetic polymorphism rs4759314 from the AA genotype to the GG genotype is significantly associated with susceptibility to gastric cancer [19]. The TT genotype at the rs1899663 site increases the risk of breast cancer compared with the GG homozygotes in the Chinese population [20]. Therefore, these four sites are closely associated with the onset, progression, and poor prognosis of various cancers [21,22,23,24]. The present study examined whether these four sites were associated with genetic susceptibility to lung cancer using a database analysis of the relationship between HOTAIR expression levels and different factors in lung cancer patients.

Materials and methods

Patients and healthy controls

Study participants consisted of 262 patients who were diagnosed with primary lung cancer and underwent surgical resection between 2006 and 2010 at the Department of Lung Cancer Surgery, Tianjin Medical University General Hospital. The present study also included 451 healthy control individuals without a history of cancer who were enrolled from the Physical Examination Center of Tianjin Medical University General Hospital between 2006 and 2010. All enrolled subjects were from different areas in China. Written informed consent was obtained from all participants, and the Institutional Ethics Committee of Tianjin Medical University General Hospital approved the study. Inclusion criteria were the diagnosis of primary lung cancer (TNM staging, AJCC seventh) and the availability of outcome and follow-up data. The following information was collected from the patients’ medical records: age, gender, clinical stage, pathological diagnosis, differentiation, lymph node status, metastasis, smoking status, and overall survival time. Survival was calculated from the day of resection until 1 April 2011.

Samples from the Cancer Genome Atlas (TCGA)

The TCGA database (https://cancergenome.nih.gov/) was searched, and 512 adenocarcinoma and 500 squamous cell carcinoma patients were selected. The clinical data were collected and arranged from building community resilience (BCR) data. All gene data were sourced from RNAseq expression data and were quoted after standardization through FPKM. Data in the TCGA database were updated in real time until 22 October 2017.

DNA extraction

Blood samples were collected in EDTA-coated tubes to prevent coagulation. Genomic DNA was isolated from peripheral blood lymphocytes of all participants using a DNA extraction kit (Qiagen).

Genotyping

Gene polymorphism analysis was performed using the TaqMan™ genotyping technique and a predesigned TaqMan™ SNP genotyping probe (Applied Biosystems, Foster City, CA, USA) or PCR primer, probe, and TaqMan™ master mix (Applied Biosystems). All genotyping was completed in 384-well plates (7900 HT Fast Real-Time PCR System) that included a reaction system (TaqMan™ genotyping master mix; 2 units, 2.5 μL), TaqMan™ SNP genotyping assay reagent (40 units, 0.125 μL), and template DNA (2.5 μL). A no-template (NTC) and positive control were included for each test. Plates were preread, and real-time PCR was performed under the following conditions: 50 °C for 2 min, 95 °C for 10 min, followed by 50 cycles at 95 °C for 15 s and 60 °C for 1 min. Each plate was reread, and the results were analyzed using SDS 2.2 software.

Statistical analysis

Data were analyzed using SPSS 16.0 software for Windows. The intergroup mean was compared using a t-test, and P < 0.05 indicated statistical significance. The independence of risk factors was analyzed using logistic regression, and the risk rate is expressed as an odds ratio (OR) and 95% confidence interval (95% CI), with P < 0.05 indicating statistical significance. The observed and expected values were compared via calculation of the equilibrium frequency of the genotype, which underwent a χ2 test. The analysis of linkage disequilibrium (LD) between different genetic polymorphic loci was performed using SHEsis online software at http://analysis.bio-x.cn/myAnalysis.php. LD analysis of the SNPs was performed to determine their nonrandom association in our population. The LD pattern between SNPs was measured using the correlation coefficient, r2, where r2 ≥ 0.5 was considered moderate to strong.

Results

Research cohort

The present study cohort included 188 male and 74 female primary lung cancer patients aged 32–84 years old (median 62 years old) (Table 1). A total of 118 of these 262 patients had squamous cell carcinoma (SQCC), 99 patients had adenocarcinoma (ADC), 22 patients had small cell lung cancer (SCLC), 14 patients had large cell lung cancer (LCLC), seven patients had adenosquamous carcinoma, and two patients had other pathological types. A total of 169 patients had a smoking history, and 85 patients did not smoke. Table 1 presents detailed information of the lung cancer patient and healthy control groups.

Table 1 General information of all patients with NSCLC

Association between HOTAIR SNPs and lung cancer risk

The distribution frequency and corresponding OR value of the four SNP polymorphic sites in the HOTAIR gene in healthy and lung cancer patient groups are shown in Table 2. The OR value was calibrated according to gender, age, smoking history, and pathological type. Among the four SNP sites, rs920778 and rs1899663 were highly associated with lung cancer. In the healthy control and lung cancer groups, the distribution frequency varied with genotype at the rs920778 (C > T) site, and the distribution frequency of the C/T (C/T + TT) genotype in the lung cancer group was higher than that in the healthy control group (OR = 1.467, 95% CI: 1.069–2.014, P < 0.05). In addition, the distribution frequency of the TT genotype at the rs1899663 (G > T) site was significantly different from that of the GG genotype (OR = 3.220, 95% CI: 1.272–8.050, P < 0.05). In both groups, the distribution frequency of SNPs was not significantly different between the rs4759314 (A > G) or rs12826786 (C > T) sites.

Table 2 Relevance analysis between gene polymorphisms of HOTAIR signal channel and the risk of lung cancer

The distribution frequency of the C > T genotype at the rs920778 site of HOTAIR is higher in male and smoking patients with squamous cell carcinoma

A stratified analysis was performed for each genotype at each HOTAIR site according to smoking history, gender, and pathological type. Subjects were divided into smoking and nonsmoking groups. At the rs920778 site, the OR in the smoking group varied with genotype. The C/T (1.822, 95% CI: 1.151–2.884, P < 0.05) and C/T + TT (1.864, 95% CI: 1.193–2.912, P < 0.05) sites were risk factors for primary lung cancer. In the nonsmoking group, the rs920778 site was not statistically significant.

In addition, subjects were divided into male and female groups. At the rs920778 site, the OR value in the male group varied with genotype. The C/T (1.865, 95% CI: 1.266–2.748, P < 0.05) and C/T + TT (1.760, 95% CI: 1.207–2.566, P < 0.05) genotypes increased genetic susceptibility to primary lung cancer. However, in females, the rs920778 site was not statistically significant. Moreover, at the rs1899663 site, the OR value of the TT genotype in males, but not in females, was statistically significant (3.304, 95% CI: 1.027–8.907, P < 0.05).

According to the pathological type of lung cancer, the subjects were divided into groups: SCLC, adenocarcinoma, and squamous cell carcinoma. At the rs920778 site, the OR of the C/T genotype in the SCLC group was 0.334 (95% CI: 0.120–0.927, P < 0.05). However, it could not be concluded that the C/T genotype played a protective role in primary lung cancer because only 22 patients with SCLC were recruited in the present study. In the squamous cell carcinoma group, the ORs of the C/T and C/T + TT genotype were 2.042 (95% CI: 1.327–3.143, P < 0.05) and 2.007 (95% CI: 1.317–3.060, P < 0.05), respectively, which indicates statistically significant risk factors for lung cancer. In the adenocarcinoma group, the results were not statistically significant (Tables 3 and 4).

Table 3 Stratified analyses of rs920778 for association with lung cancer risk
Table 4 Stratified analyses of rs1899663 for association with lung cancer risk

Haplotype association analysis of SNPs in HOTAIR

Gene LD and haplotype analyses examined the association between HOTAIR gene polymorphisms and lung cancer risk. Genotype data at the rs12826786 site did not statistically conform to the Hardy–Weinberg equilibrium (HWE). Therefore, the three remaining gene sites were analyzed (rs920778, rs1899663, and rs4759314). At the rs920778 and rs1899663 sites, there was a high LD (D’ = 0.86, r2 = 0.52, Fig. 1). Eight possible haplotypes were investigated in the case-control group. In both the healthy control and lung cancer groups, C(rs920778)G(rs1899663)A(rs4759314) was the most common haplotype (68.2% and 61.4%, respectively), which reduced the risk of lung cancer (OR: 0.727, 95% CI: 0.576–0.918, P = 0.007). The risk of lung cancer was significantly decreased in the population with the T(rs920778)T(rs1899663)A(rs4759314) haplotype (OR: 2.223, 95% CI: 1.517–3.258, P < 0.001).

Fig. 1
figure 1

Conditions of linkage disequilibrium for the three gene sites, rs920778, rs1899663, and rs4759314, analyzed using SHESIS software. a LD plot of the SNPs in HOTAIR; the significant SNPs are indicated by red boxes. b Numbers in squares indicate 100-fold r2 values for each pair of SNPs

Analyses of HOTAIR expression, clinical characteristics, and prognostic conditions in lung cancer patients using the TCGA database

We previously observed that the C/T (C/T + TT) genotype at the rs920778 site increased the susceptibility to lung cancer in smokers. This SNP site is located within the enhancer of the HOTAIR gene, and allele variation at this site influences the expression level and function of the HOTAIR gene. As a result, in individuals with the T allele, the expression level of the HOTAIR gene was higher than that in individuals with the CC genotype. To further investigate the relationship between the expression conditions of HOTAIR and the clinical characteristics of lung cancer, the differences in HOTAIR expression levels were analyzed between lung cancer and adjacent lung tissues from the TCGA database. The expression levels of HOTAIR in squamous cell carcinoma and adenocarcinoma patients were significantly higher than that in the paired tissues adjacent to the cancer (P < 0.001, Fig. 2a) and highly correlated with the pathological type of lung cancer (Table 5, Fig. 2b). HOTAIR expression levels were significantly higher in squamous cell carcinoma patients than in adenocarcinoma patients. HOTAIR expression levels were also closely associated with smoking history (Table 5, Fig. 2d) and were significantly increased in smoking lung cancer patients. The association between HOTAIR expression level and the clinicopathological characteristics (or survival prognostic conditions) of lung cancer patients was evaluated. The median HOTAIR expression level in 1012 lung cancer patients was assessed as high or low. The survival prognostic conditions in patients with low HOTAIR expression were significantly superior to those in patients with a high expression (P = 0.048, Fig. 3a) and superior to those in patients with high expression and smoking status (P = 0.037, Fig. 3b).

Fig. 2
figure 2

Analysis of HOTAIR expression in different clinical characteristics. a Comparison between paired lung cancer and normal tissue. b Comparison between adenocarcinoma and squamous cell carcinoma patients. c Comparison between male and female patients. d Comparison between smoking and nonsmoking patients. AD adenocarcinoma, SCC squamous cell carcinoma

Table 5 Clinicopathological characteristics of patients with NSLSC in the TCGA cohorts
Fig. 3
figure 3

Kaplan-Meier survival curves for lung cancer patients according to HOTAIR expression. a All cases. b Smoking cases

Discussion

The morbidity rate of lung cancer in China is increasing; therefore, it is of great importance to elucidate the genetic factors associated with susceptibility to lung cancer. LncRNAs recently attracted research attention. Most functions of lncRNAs are not clear, but several intracellular roles, including a function in tumor progression, were identified. HOTAIR is a functional lncRNA that is derived from the HOXC gene cluster, and it is closely linked to the progression of lung cancer [17, 18, 25]. However, genetic variations in HOTAIR may regulate individual susceptibility to cancer and could influence HOTAIR expression and function [26,27,28,29]. Therefore, the present molecular epidemiology observational study examined whether the functional sites of HOTAIR were associated with genetic susceptibility to lung cancer. To the best of our knowledge, the present study is the first epidemiological study to examine the relationship between HOTAIR and susceptibility to lung cancer.

Many studies examined the influence of gene SNPs on susceptibility to lung cancer. For example, SNPs in miR-219-1 (rs213210, rs421446, and rs107822) significantly influence the susceptibility and prognosis of NSCLC [30]. In addition, compared with the AA genotype of ataxia telangiectasia-mutated (ATM), the TT genotype significantly increases the risk of lung cancer [31]. The present study examined whether SNPs in HOTAIR increased the susceptibility to lung cancer. Four sites were selected for statistical analyses: rs920778 (C > T), rs12826786 (C > T), rs4759314 (A > G), and rs1899663 (G > T). As expected, in the present hospital-based case–control study, the C/T and C/T + TT genotypes at the rs920778 site were significant risk factors for primary lung cancer compared with the CC genotype. The TT genotype at the rs1899663 site increased the susceptibility to lung cancer compared with the GG genotype. However, the actual increase may have been underestimated because of the small sample size. As shown by the stratified analysis of further factors (smoking, gender, and pathological type), the C/T and C/T + TT genotypes at the rs920778 site significantly increased the susceptibility to lung cancer in males but not in females. The C/T and C/T + TT genotypes at the rs920778 site also increased the risk of lung cancer in smokers. Therefore, as we expected, smoking-related factors influenced HOTAIR expression via the induction of gene mutations within the gene itself. Cigarette smoke extract induces the inflammatory cell factor, IL-6, which activates the transcription factor STAT3 and directly interacts and increases the expression level of HOTAIR [32]. In the present study, the relationship between HOTAIR expression levels and smoking history in squamous cell carcinoma and adenocarcinoma patients was also analyzed in the TCGA database. In lung cancer patients, HOTAIR expression levels were strongly associated with smoking history (P < 0.001), and it was significantly higher in smoking patients than in nonsmoking patients. The results of the present study fully support the conclusions. However, the susceptibility factors for primary lung cancer were inferred from analyses of squamous cell carcinoma and adenocarcinoma data of NSCLC patients only. Therefore, all pathological types of primary lung cancer should be analyzed in future studies.

It has been shown that compared with the CC genotype at the rs920778 site, the TT genotype at the rs920778 site significantly increases the risk of esophageal squamous cell carcinoma, and HOTAIR expression level is increased in carriers of the T allele at the rs920778 site within the HOTAIR enhancer [13]. Another study reported that the expression level of HOTAIR was increased significantly in lung cancer patients, which was associated with the malignant phenotype and prognosis of lung cancer [33]. The C/T and C/T + TT genotypes at the rs920778 site increased the risk of squamous cell carcinoma, which indicates the possible association of HOTAIR SNPs with the pathological type of lung cancer [34]. Further analysis of lung cancer patients identified in the TCGA database revealed that the expression level of HOTAIR in squamous cell carcinoma patients was significantly higher than that in adenocarcinoma patients (P = 0.018), both of which were higher than that in the healthy population. These observations fully support our hypothesis that SNPs in the rs920778 site within HOTAIR significantly increase susceptibility to squamous cell carcinoma by increasing the expression level of HOTAIR. However, one limitation of our study is that HOTAIR expression levels were not evaluated in the lung cancer or control groups. Therefore, we cannot further verify whether the SNPs in HOTAIR influenced susceptibility to lung cancer by increasing HOTAIR expression levels. This analysis is planned in future studies.

In summary, gene polymorphisms in rs920778 and rs1899663 within HOTAIR significantly increase susceptibility to lung cancer. These two sites exhibit LD. In the population with a T(rs920778)C(rs1899663)A(rs4759314) haplotype, the risk of lung cancer is significantly increased. In addition, among NSCLC patients, the expression level of HOTAIR in squamous cell carcinoma is significantly higher than that in adenocarcinoma, and smoking influences HOTAIR expression levels. HOTAIR may be a novel predictor of the biological behavior of tumors and a novel target for the treatment of lung cancer.