Introduction

The incidence and mortality rates of lung cancer rank first worldwide, with 2,093,876 (11.6% of total cancer patients) new cases and 1,761,007 (18.4% of total cancer deaths) deaths occurred globally in 20181. Considering the sex-distribution difference, lung cancer is the leading cause of cancer incidence (1,368,524 new cases) and death (1,184,947 deaths) among males, accounting for approximately 14.4% new cases and 22.0% deaths respectively, and the third prevalent cancer (725,352 new cases, 8.4%) with second fatality rate (576,060 deaths, 13.8%) among females1. The morbidity and mortality of lung cancer is very high in China, even worse, continue rising in recent years2. Previous epidemiological studies have reported a dramatical increase of crude incidence of lung cancer among Chinese from 2000 to 2014, which reached 57.13 per 100,000 in 2014 (74.31 per 100,000 in males and 39.08 per 100,000 in females)2,3. In the same year, the estimated mortality rate of lung cancer deaths is 45.80 per 100,000 (61.10 per 100,000 in males and 29.71 per 100,000 in females)3. Multiple etiological analyses have found that external risk factors such as smoking, air pollution and occupational exposure contribute to the lung tumorigenesis4,5,6. Moreover, researchers have emphasized that genetic factors play critical roles in the development of lung cancer as well. Nowadays, accumulating evidence has elucidated that genetic polymorphisms are risk markers that can mediate individual susceptibility to lung cancer7,8,9,10,11. These findings also provide multiple promising indicators that pave the road for individual cancer risk assessment. Therefore, exploring these novel factors is a vital step which not only reveal the underlying genetic mechanisms of lung cancer, but facilitate cancer risk evaluation among populations with diverse genetic backgrounds.

Collagen type VI has been identified as a extracellular matrix protein with functional role and ubiquitous distribution in numerous connective tissues, where it forms microfibrils organized into extend microfilamentous network12. It has been well established for a long time that collagen VI is comprised of three genetically distinct α-chains, namely collagen VI α1, α2 and α3, which is encoded by COL6A1, COL6A2, and COL6A3 respectively13. Subsequently, three new collagen VI chains were identified, and designated α4, α5 and α6 chains that are encoded by distinct genes, COL6A4, COL6A5, and COL6A614. The aberrant expression of collagen VI genes has been reported in several malignant tumors, suggesting the potential effects of collagen VI in cancer development15,16,17. According to previous publications, the interaction between collagen VI and other components will lead to the remodeling of extracellular matrix, and stimulate the signaling pathway implicated in cell proliferation, migration, differentiation and apoptosis18,19,20,21. Collagen VI exerts important functions in maintaining the integrity of lung tissue22. And COL6A5 (Collagen type VI α5), which is also known as COL29A1, was highly expressed in lung tissue as well as skin, whereas less is reported on lung cancer23. Accordingly, we hypothesized that the single nucleotide polymorphisms (SNPs) in COL6A5 might be associated with lung cancer risk, and performed a genetic association study to investigate the relationships between the COL6A5 variations and lung cancer predisposition among Chinese Han population.

Materials and Methods

Ethics statement and consent to subjects

This study was performed with the approval of the ethics committee of The First Affiliated Hospital of School of Medicine of Xi’an Jiaotong University and all the procedures conformed to the 1964 Helsinki declaration and its later amendments. All the subjects were informed verbally and in writting of the protocols in this work. Signed informed consent was obtained prior to participation.

Study participants

The present study was carried out based on two groups of sample: 510 patients with diagnosed lung cancer and 495 healthy individuals. All the participants were recruited from Shaanxi Provincial Cancer Hospital. Cases were hospitalized from 2015 to 2018 and histopathological diagnosed as primary lung carcinoma by at least two experienced pathologists. Electronic bronchoscopy, percutaneous lung puncture, sputum exfoliation and pleural effusion cytology examination were supplementary methods for diagnosis. Patients with other cancers, cancer history or chronic diseases of respiratory system (for example: chronic obstructive pulmonary disease and asthma), and had radiation or chemical therapy were excluded from this work. The clinical information and basic characteristics of the patients were recorded by physicians. During the same time, healthy subjects were randomly enrolled when they were attending the routine physical examination at the same hospital. The exclusion criteria of the control group were as follows: 1) previous malignancy history; 2) chronic respiratory diseases and autoimmune disorders; 3) family history of pulmonary tumor or genetic diseases. The eligible controls and cases were hereditarily and unrelated Chinese Han individuals from northwest China. The numbers of male and female were matched in 1:1 fashion between cases and controls. During the recruitment, we randomly screened the samples according to the inclusion and exclusion criteria and tried our best to avoid the population stratification.

SNP selection and genotyping

Searching the 1000 Genomes website (http://www.internationalgenome.org/), we downloaded the data of COL6A5 variations in CHB (Chinese Han Beijing) and then inputted them into haploview 4.2 software. The SNPs whose minor allele frequency (MAF) > 0.05, Hardy-Weinberg equilibrium (HWE) > 0.01 and r2 > 0.8 were selected for linkage disequilibrium block construction. Subsequently, we obtained several tag-SNPs which could represent the polymorphisms of the strong linkage regions. Ultimately, six COL6A5 polymorphisms, rs77123808, rs2034664, rs10212241, rs13062453, rs1497305, and rs2403340 were selected for genotyping. The schematic representation of COL6A5 gene was performed by Gene Structure Display Server version 2.0 (http://gsds.cbi.pku.edu.cn/), and specific location of each SNP is marked in Fig. 1. The variants were further searched in dbSNP database (http://www.bioinfo.org.cn/relative/dbSNP%20Home%20Page.htm) to acquire detailed genetic information. A volume of 3–5 ml peripheral blood was donated from the participants and stored in ethylene diamine tetraacetic acid (EDTA) anticoagulant tubes at −80 °C. GoldMag-Mini Whole Blood Genomic DNA Purification Kit (GoldMag Co. Ltd, Xi’an City, China) was used for genomic DNA extraction from each sample according to the standard protocol. The purified and concentrated DNA was evaluated by NanoDrop spectrophotometer (Thermo Scientific, Waltham, Massachusetts, USA) at wavelength of 260 nm and 280 nm. The qualified DNA was used as the template for the primary polymerase chain reaction (PCR), and followed by a unique base extension. Using the Agena MassARRY method (Agena Bioscience, San Diego, CA, USA), the genotypes of the six variants were assayed with products of the last PCR step. The genotype data was read out by the Agena Bioscience TYPER software, version 4.0.

Figure 1
figure 1

COL6A5 gene schematic representation. The specific location of each SNP is marked by asterisk. Minimum unit of the measure is 1 kb. The flanking sequences of SNPs were provided by dbSNP database (http://www.bioinfo.org.cn/relative/dbSNP%20Home%20Page.htm).

Statistical analysis

The distribution differences of the age and gender among cases and controls were assessed by independent samples t-test and Pearson’s chi-squared test, respectively. Genotype frequency of the variations in the two study groups were compared with Pearson’s chi-squared test. HWE p value was also calculated in controls using Pearson’s chi-squared test. Furthermore, logistic regression analysis with adjustment was conducted to estimate the associations of the candidate SNPs with lung cancer susceptibility in genetic models (allele model, genotype model, dominant model, recessive model, and additive model) via available odds ratio (OR) values and 95% confidence intervals (CIs). The minor allele of each variation was hypothesized as the risk factor in our research. In statistical analysis, p = 0.05 was considered significant threshold. Stratified analyses by age, gender, pathological type, and TNM stage were carried out as well. SPSS version 20.0 (IBM SPSS, Inc., Chicago, IL, USA) and PLINK package, version 2.1.7 were used for basic characteristic statistics and risk evaluation.

Expression and prognosis analyses in databases

In order to further investigate the role of COL6A5 gene in lung tumorigenesis, we performed a large sample-based expression detection in Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancer-pku.cn/) and overall survival analysis in Kaplan-Meier Plotter (Lung cancer, http://kmplot.com/analysis/index.php?p=service&cancer=lung). The expression levels of COL6A5 were determined in lung cancer (including lung adenocarcinoma and lung squamous cell carcinoma) tissues and normal samples from healthy individuals. The prognostic differences between patients with high and low expression of COL6A5 were assessed by hazard ratio (HR) and log-rank p value.

Results

Expression and prognosis patterns of COL6A5 gene

With the GEPIA database, we detected the expression levels of COL6A5 in lung tumors and normal tissues. Overall, COL6A5 was down-regulated in both lung adenocarcinoma and lung squamous cell carcinoma (Fig. 2). Subsequently, prognosis analysis indicated that patients with low expression of COL6A5 had worse overall survival (Fig. 3, HR = 0.67, 95% CI = 0.56–0.79, p < 0.001).

Figure 2
figure 2

Comparison of the COL6A5 gene expression in lung adenocarcinoma (LUAD, N = 483), squamous cell carcinoma (LUSC, N = 486) tumors and healthy tissues (LUAD normal tissues: N = 347; LUSC normal tissues: N = 338) in GEPIA database. The expression of COL6A5 gene was down-regulated in both lung cancer types. *p < 0.05.

Figure 3
figure 3

Low expression of COL6A5 gene is associated with poor overall survival in lung cancer patients. The Kaplan-Meier curves presented the overall survival of lung cancer patients with high (red) and low (black) COL6A5 expression levels and generated by Kaplan–Meier method.

Study subjects

The basic characteristics of the 510 lung cancer patients and 495 healthy individuals were compared and showed in Table 1. The case group consisted of 355 males and 155 females, with a mean age of 60.78 ± 9.96. Moreover, pathological type, TNM classification and lymph node metastasis status of the most lung cancer patients were also collected and involved in further stratified analysis. As for the controls, 346 males and 149 females were enrolled, which were matched to the cases in an approximately 1:1 fashion (p = 0.920). The mean age of the healthy subjects was 61.94 ± 7.72 years in this study.

Table 1 Demographic information of the lung cancer cases and healthy controls in this study.

Basic information of the candidate SNPs

Specific information and minor allele frequency of the six candidate SNPs are listed in Table 2, including rs77123808 (C < A), rs2034664 (T < A), rs10212241 (T < C), rs13062453 (A < G), rs1497305 (A < G), and rs2403340 (A < G). The call rate for each SNP was over 99% in the current study. All the selected variants are located in COL6A5, chromosome 3, with MAF > 0.05 in both cases and controls. From Table 2, we found that the frequency of minor allele “T” of rs10212241 (0.438 in cases and 0.438 in controls) and “A” of rs2403340 (0.175 in cases and 0.174 in controls) were similar among lung cancer patients and healthy subjects. Genotype distributions of the six COL6A5 variants in lung cancer cases and healthy controls are showed in Supplementary Table S1. Pearson’s chi-squared test p value of rs1497305 indicated that there was difference in genotype distribution between the two groups (p = 0.020). In control group, the HWE p value of each variant was calculated (rs77123808: 1.00. rs2034664: 0.04, rs10212241: 1.00, rs13062453: 0.15, rs1497305: 0.09, rs2403340: 0.04). Because of the lower MAF in our population, rs2034664 and rs2403340 exhibited marginal HWE p values and were excluded from further analysis.

Table 2 Descriptive information of the selected SNPs in COL6A5.

Association analysis of COL6A5 variants with lung cancer risk

We first investigated the relationships between rs77123808, rs10212241, rs13062453, rs1497305 and lung cancer risk in the Chinese Han population. As presented in Table 3, the heterozygous “GA” genotype of rs13062453 was associated with an enhanced risk of lung cancer when compared with the homozygous “GG” genotype (adjusted OR = 1.33, 95% CI = 1.02–1.74, p = 0.036, power = 61.77%). Genetic model analysis revealed statistical significance of this variants in dominant model as well (adjusted OR = 1.30, 95% CI = 1.01–1.68, p = 0.039, power = 54.43%; Supplementary Table S2). Likewise, remarkable correlation also existed between the “GA” genotype of rs1497305 and increased lung cancer susceptibility among Chinese Han individuals according to an adjusted OR of 1.37 (95% CI = 1.05–1.78, power = 69.11%) and significant p value of 0.019 (Table 3). However, rs77123808 and rs10212241did not show any associations with lung cancer in our study cohort (p > 0.05).

Table 3 Evaluation of the correlation between COL6A5 variants and lung cancer susceptibility among Chinese Han population.

Association analysis in stratified subgroups

Furthermore, stratified analysis was performed in order to explore the effects of these SNPs on lung cancer in different subgroups. First, the correlations between the genetic variations and lung cancer risk were evaluated among people with different age and gender, and the significant COL6A5 variants were summarized in Table 4 (allele and genotype analysis) and Supplementary Table S3 (Genetic model analysis). Among the individuals aged younger than 61 years, the rs1497305 “GA” carriers harbored an increased risk of lung cancer when compared with “GG” carriers (adjusted OR = 1.59, 95% CI = 1.08–2.34, p = 0.019). In females, rs13062453 and rs1497305 polymorphisms elevated the risk of lung cancer according to the genotype and dominant model (rs13062453: “GA” genotype adjusted OR = 1.75, 95% CI = 1.07–2.86, p = 0.027, dominant model adjusted OR = 1.68, 95% CI = 1.05–2.86, p = 0.030; rs1497305: “AA” genotype adjusted OR = 1.79, 95% CI = 1.10–2.90, p = 0.019, dominant model adjusted OR = 1.59, 95% CI = 1.01–2.52, p = 0.046), whereas rs77123808 reduced the predisposition to lung tumorigenesis (“AC” adjusted OR = 0.51, 95% CI = 0.31–0.83, p = 0.006, dominant model adjusted OR = 0.58, 95% CI = 0.37–0.92, p = 0.021). The significant associations could not be detected for other candidate SNPs and in group older than 61 years or males (p > 0.05). Second, we examinated the associations of COL6A5 SNPs with two common pathological types of lung cancer, namely lung adenocarcinoma and squamous cell carcinoma. As showed in Table 5, the “GA” genotype of rs13062453 and rs1497305 wasfound to be correlated to the increased risk of lung adenocarcinoma (rs13062453: adjusted OR = 1.46, 95% CI = 1.01–2.11, p = 0.045; rs1497305: adjusted OR = 1.76, 95% CI = 1.23–2.51, p = 0.002). Rs1497305 also showed obvious evidence in correlation to lung adenocarcinoma susceptibility in dominant model (adjusted OR = 1.51, 95% CI = 1.07–2.13, p = 0.018) and recessive model (adjusted OR = 0.45, 95% CI = 0.22–0.96, p = 0.038) (Supplementary Table S4). However, analysis on squamous cell carcinoma did not uncover any significant SNPs. Lastly, the contributions of COL6A5 variants on TNM staging and lymph node metastasis were demonstrated. Notably, pronounced results of the polymorphism rs1497305 still survived in all statistical analysis for TNM staging even after adjustment, which suggested the importance of this SNP in lung cancer progression (p < 0.05, Table 6 and Supplementary Table S5). Unexpectedly, there were no significant findings between all the selected SNPs and lymph node metastasis (p > 0.05, Supplementary Tables S6 and S7).

Table 4 Significant COL6A5 variants associated with lung cancer susceptibility in stratified analysis for age or gender.
Table 5 Significant COL6A5 variants in stratified analysis for pathological type.
Table 6 Significant COL6A5 variants associated with TNM staging of lung cancer among Chinese Han population.

Discussion

The current study evaluated the correlations between polymorphisms in COL6A5 gene and lung cancer susceptibility among Chinese Han individuals. We found that COL6A5 rs13062453, rs1497305, and rs77123808 were associated with the risk of lung cancer in Chinese Han population. Additionally, COL6A5 rs1497305 was a potential predictor for TNM staging. Basing on the expression patterns presented by databases, we found a down-regulation of COL6A5 in lung tumors when compared with the normal tissues. The low expression level was linked to the worse overall survival of lung cancer patients. These results preliminary elucidated the potential genetic determinants in COL6A5 of lung cancer, and first uncovered the involvement of COL6A5 gene in lung carcinogenesis.

Collage family consists of 28 types of extracellular matrix proteins with α chains, which can facilitate cell growth and maintain the mechanical elasticity of connective tissue24. Collagen type VI plays a prominent role in both tumorigenesis and metastasis25. Collagen type VI α1, α2 and α3 are three major extracellular matrix protein of collagen type VI subgroup, which can promote tumor growth25. Collagen type VI α4, α5 and α6 are three members that are identified lately, with a N-terminal region containing seven vWF-A modules, a collagen triple helical region, and a C-terminal region made of two or three vWF-A modules as well as specific sequence14,26. However, it is largely unknown how collagen type VI α4, α5 and α6 are implicated in tumor development. COL6A6 has been reported to be down-regulated in breast cancer27,28,29. Significant down-regulation of COL6A5 has been proved in esophageal squamous cell carcinoma30 and fibrous epulis31. Wang and the colleagues has found that collagen type VI may interacts with ROBO2 in esophageal squamous cell carcinoma development, and ROBO2 serves as a tumor suppressor during malignant process32. Collagen type VI serves as an indispensable protein in extracellular matrix, and mainly deposited in lung fibrosis33. Pulmonary fibrosis, combined with emphysema, has been fully demonstrated to be strongly associated with the development of lung cancer34,35. However, the characteristics of COL6A5 gene in lung cancer remains incompletely understood. Bioinformatics analyses showed a decrease of COL6A5 level in lung tumor, and the low expression resulted in poor prognosis. We thus speculated that COL6A5 functions as a tumor suppressor gene in oncogenesis of lung. More convincing experimental evidence is needed to confirm these findings.

Single nucleotide substitutions in genes have been validated to be the risk factors in diverse diseases, which are proposed as predictors for individual risk assessment. Susceptible SNPs of COL6A5 have been identified and demonstrated in familial neuropathic chronic itch in previous research36. Eight variants located in COL6A5 gene have been demonstrated to be related to the susceptibility to atopic dermatitis among Europeans37. A longitudinal exome-wide association study performed among Japanese has showed that COL6A5 SNP rs11917356 was significantly correlated with hypertension risk through affect systolic blood pressure38. Furthermore, it has been reported that variations in COL6A5 might play causal roles for asthma together with environmental exposures39,40. But a recent study has not found any significant relationships between COL6A5 variants and asthma or chronic obstructive pulmonary disease in German41. In our study, COL6A5 polymorphisms rs13062453, rs1497305, and rs77123808 were genetic polymorphic markers that confer susceptibility to lung cancer among Chinese Han individuals. As all the significant SNPs were resided in the intron region of COL6A5 gene, which possesses regulatory functions in pre-mRNA processing42, we considered that alternation at these polymorphic sites might modulate the expression efficiency of COL6A5 mRNA, contributing to the abnormal pattern of COL6A5 and thereby influencing the individual susceptibility to lung cancer. Additionally, COL6A5 rs1497305 was found to be associated with TNM staging, which harbors the potential to become new target for lung cancer clinical stage evaluation in future.

There are several limitations in our research that should be acknowledged. First, all the cases and controls are recruited from one hospital, and the possible selection bias may exist. The random subject selection could help us reduce the selection bias to some degree. Second, the expression detection and prognosis analysis of COL6A5 gene were carried out by databases, which should be verified with well-designed experimental study. Third, the underlying mechanism of the promising SNPs and COL6A5 gene in lung cancer development is still unclear, thus suggesting the nest step for the further research.

In summary, this is the first research that discussed the relationships between COL6A5 polymorphisms and lung cancer risk. We uncovered significant variants that were associated with lung cancer susceptibility in Chinese Han population. These findings also provided supporting evidence for the involvement of COL6A5 gene in the occurrence of lung cancer.