Collagen type VI α5 gene variations may predict the risk of lung cancer development in Chinese Han population

The abundant expression of collagen type VI α5 (COL6A5) exists in lung tissue, and its role in lung cancer is still unknown. We performed a genetic association study with an attempt to detect the relationships between single nucleotide polymorphisms (SNPs) in COL6A5 and lung cancer predisposition in Chinese Han population. We finally selected six tag-SNPs to determine their genotypes among 510 lung cancer patients and 495 healthy controls with the MassARRAY platform. The associations of SNPs and lung cancer risk were estimated by logistic regression method with adjustment for confounding factors. Two available databases were used for gene expression and prognosis analysis. COL6A5 rs13062453, rs1497305, and rs77123808 were significantly associated with the risk of lung cancer in the whole population or stratified subgroups (p < 0.05). Among them, COL6A5 rs13062453 and rs1497305 were also linked to the susceptibility of lung adenocarcinoma. Additionally, rs1497305 was found to be strongly related to the TNM staging under five genetic models (p < 0.05). Results from databases suggested the important role of COL6A5 in lung cancer development. COL6A5 polymorphisms rs13062453, rs1497305 and rs77123808 were associated with lung cancer risk in Chinese Han population. These findings first yield new insight of COL6A5 in lung cancer.

respectively 13 . Subsequently, three new collagen VI chains were identified, and designated α4, α5 and α6 chains that are encoded by distinct genes, COL6A4, COL6A5, and COL6A6 14 . The aberrant expression of collagen VI genes has been reported in several malignant tumors, suggesting the potential effects of collagen VI in cancer development [15][16][17] . According to previous publications, the interaction between collagen VI and other components will lead to the remodeling of extracellular matrix, and stimulate the signaling pathway implicated in cell proliferation, migration, differentiation and apoptosis [18][19][20][21] . Collagen VI exerts important functions in maintaining the integrity of lung tissue 22 . And COL6A5 (Collagen type VI α5), which is also known as COL29A1, was highly expressed in lung tissue as well as skin, whereas less is reported on lung cancer 23 . Accordingly, we hypothesized that the single nucleotide polymorphisms (SNPs) in COL6A5 might be associated with lung cancer risk, and performed a genetic association study to investigate the relationships between the COL6A5 variations and lung cancer predisposition among Chinese Han population.

Materials and Methods
ethics statement and consent to subjects. This study was performed with the approval of the ethics committee of The First Affiliated Hospital of School of Medicine of Xi'an Jiaotong University and all the procedures conformed to the 1964 Helsinki declaration and its later amendments. All the subjects were informed verbally and in writting of the protocols in this work. Signed informed consent was obtained prior to participation.
Study participants. The present study was carried out based on two groups of sample: 510 patients with diagnosed lung cancer and 495 healthy individuals. All the participants were recruited from Shaanxi Provincial Cancer Hospital. Cases were hospitalized from 2015 to 2018 and histopathological diagnosed as primary lung carcinoma by at least two experienced pathologists. Electronic bronchoscopy, percutaneous lung puncture, sputum exfoliation and pleural effusion cytology examination were supplementary methods for diagnosis. Patients with other cancers, cancer history or chronic diseases of respiratory system (for example: chronic obstructive pulmonary disease and asthma), and had radiation or chemical therapy were excluded from this work. The clinical information and basic characteristics of the patients were recorded by physicians. During the same time, healthy subjects were randomly enrolled when they were attending the routine physical examination at the same hospital. The exclusion criteria of the control group were as follows: 1) previous malignancy history; 2) chronic respiratory diseases and autoimmune disorders; 3) family history of pulmonary tumor or genetic diseases. The eligible controls and cases were hereditarily and unrelated Chinese Han individuals from northwest China. The numbers of male and female were matched in 1:1 fashion between cases and controls. During the recruitment, we randomly screened the samples according to the inclusion and exclusion criteria and tried our best to avoid the population stratification.
Snp selection and genotyping. Searching the 1000 Genomes website (http://www.internationalgenome. org/), we downloaded the data of COL6A5 variations in CHB (Chinese Han Beijing) and then inputted them into haploview 4.2 software. The SNPs whose minor allele frequency (MAF) > 0.05, Hardy-Weinberg equilibrium (HWE) > 0.01 and r 2 > 0.8 were selected for linkage disequilibrium block construction. Subsequently, we obtained several tag-SNPs which could represent the polymorphisms of the strong linkage regions. Ultimately, six COL6A5 polymorphisms, rs77123808, rs2034664, rs10212241, rs13062453, rs1497305, and rs2403340 were selected for genotyping. The schematic representation of COL6A5 gene was performed by Gene Structure Display Server version 2.0 (http://gsds.cbi.pku.edu.cn/), and specific location of each SNP is marked in Fig. 1. The variants were further searched in dbSNP database (http://www.bioinfo.org.cn/relative/dbSNP%20Home%20Page.htm) to acquire detailed genetic information. A volume of 3-5 ml peripheral blood was donated from the participants and stored in ethylene diamine tetraacetic acid (EDTA) anticoagulant tubes at −80 °C. GoldMag-Mini Whole Blood Genomic DNA Purification Kit (GoldMag Co. Ltd, Xi'an City, China) was used for genomic DNA extraction from each sample according to the standard protocol. The purified and concentrated DNA was evaluated by NanoDrop spectrophotometer (Thermo Scientific, Waltham, Massachusetts, USA) at wavelength of 260 nm and 280 nm. The qualified DNA was used as the template for the primary polymerase chain reaction (PCR), and followed by a unique base extension. Using the Agena MassARRY method (Agena Bioscience, San Diego, CA, USA), the genotypes of the six variants were assayed with products of the last PCR step. The genotype data was read out by the Agena Bioscience TYPER software, version 4.0. www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. The distribution differences of the age and gender among cases and controls were assessed by independent samples t-test and Pearson's chi-squared test, respectively. Genotype frequency of the variations in the two study groups were compared with Pearson's chi-squared test. HWE p value was also calculated in controls using Pearson's chi-squared test. Furthermore, logistic regression analysis with adjustment was conducted to estimate the associations of the candidate SNPs with lung cancer susceptibility in genetic models (allele model, genotype model, dominant model, recessive model, and additive model) via available odds ratio (OR) values and 95% confidence intervals (CIs). The minor allele of each variation was hypothesized as the risk factor in our research. In statistical analysis, p = 0.05 was considered significant threshold. Stratified analyses by age, gender, pathological type, and TNM stage were carried out as well. SPSS version 20.0 (IBM SPSS, Inc., Chicago, IL, USA) and PLINK package, version 2.1.7 were used for basic characteristic statistics and risk evaluation. expression and prognosis analyses in databases. In order to further investigate the role of COL6A5 gene in lung tumorigenesis, we performed a large sample-based expression detection in Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancer-pku.cn/) and overall survival analysis in Kaplan-Meier Plotter (Lung cancer, http://kmplot.com/analysis/index.php?p=service&cancer=lung). The expression levels of COL6A5 were determined in lung cancer (including lung adenocarcinoma and lung squamous cell carcinoma) tissues and normal samples from healthy individuals. The prognostic differences between patients with high and low expression of COL6A5 were assessed by hazard ratio (HR) and log-rank p value.

Results
expression and prognosis patterns of COL6A5 gene. With the GEPIA database, we detected the expression levels of COL6A5 in lung tumors and normal tissues. Overall, COL6A5 was down-regulated in both lung adenocarcinoma and lung squamous cell carcinoma (Fig. 2). Subsequently, prognosis analysis indicated that patients with low expression of COL6A5 had worse overall survival (Fig. 3, HR = 0.67, 95% CI = 0.56-0.79, p < 0.001).

Study subjects.
The basic characteristics of the 510 lung cancer patients and 495 healthy individuals were compared and showed in Table 1. The case group consisted of 355 males and 155 females, with a mean age of 60.78 ± 9.96. Moreover, pathological type, TNM classification and lymph node metastasis status of the most lung cancer patients were also collected and involved in further stratified analysis. As for the controls, 346 males and 149 females were enrolled, which were matched to the cases in an approximately 1:1 fashion (p = 0.920). The mean age of the healthy subjects was 61.94 ± 7.72 years in this study.
Basic information of the candidate Snps. Specific information and minor allele frequency of the six candidate SNPs are listed in Table 2, including rs77123808 (C < A), rs2034664 (T < A), rs10212241 (T < C), rs13062453 (A < G), rs1497305 (A < G), and rs2403340 (A < G). The call rate for each SNP was over 99% in the current study. All the selected variants are located in COL6A5, chromosome 3, with MAF > 0.05 in both cases and controls. From Table 2, we found that the frequency of minor allele "T" of rs10212241 (0.438 in cases and 0.438 in controls) and "A" of rs2403340 (0.175 in cases and 0.174 in controls) were similar among lung cancer patients and healthy subjects. Genotype distributions of the six COL6A5 variants in lung cancer cases and healthy controls www.nature.com/scientificreports www.nature.com/scientificreports/ are showed in Supplementary Table S1. Pearson's chi-squared test p value of rs1497305 indicated that there was difference in genotype distribution between the two groups (p = 0.020). In control group, the HWE p value of each variant was calculated (rs77123808: 1.00. rs2034664: 0.04, rs10212241: 1.00, rs13062453: 0.15, rs1497305: 0.09, rs2403340: 0.04). Because of the lower MAF in our population, rs2034664 and rs2403340 exhibited marginal HWE p values and were excluded from further analysis.
Association analysis of COL6A5 variants with lung cancer risk. We first investigated the relationships between rs77123808, rs10212241, rs13062453, rs1497305 and lung cancer risk in the Chinese Han population. As presented in Table 3, the heterozygous "GA" genotype of rs13062453 was associated with an enhanced risk of lung cancer when compared with the homozygous "GG" genotype (adjusted OR = 1.33, 95% CI = 1.02-1.74, p = 0.036, power = 61.77%). Genetic model analysis revealed statistical significance of this variants in dominant   Table S2). Likewise, remarkable correlation also existed between the "GA" genotype of rs1497305 and increased lung cancer susceptibility among Chinese Han individuals according to an adjusted OR of 1.37 (95% CI = 1.05-1.78, power = 69.11%) and significant p value of 0.019 (Table 3). However, rs77123808 and rs10212241did not show any associations with lung cancer in our study cohort (p > 0.05).
Association analysis in stratified subgroups. Furthermore, stratified analysis was performed in order to explore the effects of these SNPs on lung cancer in different subgroups. First, the correlations between the genetic variations and lung cancer risk were evaluated among people with different age and gender, and the significant COL6A5 variants were summarized in Table 4 (allele and genotype analysis) and Supplementary  Table S3 (Genetic model analysis). Among the individuals aged younger than 61 years, the rs1497305 "GA" carriers harbored an increased risk of lung cancer when compared with "GG" carriers (adjusted OR = 1.59, 95% CI = 1.08-2.34, p = 0.019). In females, rs13062453 and rs1497305 polymorphisms elevated the risk of lung cancer according to the genotype and dominant model (rs13062453: "GA" genotype adjusted OR = 1.75, 95% CI = 1.07-2.86, p = 0.027, dominant model adjusted OR = 1.68, 95% CI = 1.05-2.86, p = 0.030; rs1497305: "AA" genotype adjusted OR = 1.79, 95% CI = 1.10-2.90, p = 0.019, dominant model adjusted OR = 1.59, 95% CI = 1.01-2.52, p = 0.046), whereas rs77123808 reduced the predisposition to lung tumorigenesis ("AC" adjusted OR = 0.51, 95% CI = 0.31-0.83, p = 0.006, dominant model adjusted OR = 0.58, 95% CI = 0.37-0.92, p = 0.021). The significant associations could not be detected for other candidate SNPs and in group older than 61 years or males (p > 0.05). Second, we examinated the associations of COL6A5 SNPs with two common pathological types of lung cancer, namely lung adenocarcinoma and squamous cell carcinoma. As showed in Table 5, the "GA" genotype of rs13062453 and rs1497305 wasfound to be correlated to the increased risk of lung adenocarcinoma    Table S4). However, analysis on squamous cell carcinoma did not uncover any significant SNPs. Lastly, the contributions of COL6A5 variants on TNM staging and lymph node metastasis were demonstrated. Notably, pronounced results of the polymorphism rs1497305 still survived in all statistical analysis for TNM staging even after adjustment, which suggested the importance of this SNP in lung cancer progression (p < 0.05, Table 6 and Supplementary Table S5). Unexpectedly, there were no significant findings between all the selected SNPs and lymph node metastasis (p > 0.05, Supplementary Tables S6 and S7).

Discussion
The current study evaluated the correlations between polymorphisms in COL6A5 gene and lung cancer susceptibility among Chinese Han individuals. We found that COL6A5 rs13062453, rs1497305, and rs77123808 were associated with the risk of lung cancer in Chinese Han population. Additionally, COL6A5 rs1497305 was a potential predictor for TNM staging. Basing on the expression patterns presented by databases, we found a down-regulation of COL6A5 in lung tumors when compared with the normal tissues. The low expression level was linked to the worse overall survival of lung cancer patients. These results preliminary elucidated the potential genetic determinants in COL6A5 of lung cancer, and first uncovered the involvement of COL6A5 gene in lung carcinogenesis.
Collage family consists of 28 types of extracellular matrix proteins with α chains, which can facilitate cell growth and maintain the mechanical elasticity of connective tissue 24 . Collagen type VI plays a prominent role in both tumorigenesis and metastasis 25 . Collagen type VI α1, α2 and α3 are three major extracellular matrix protein of collagen type VI subgroup, which can promote tumor growth 25 . Collagen type VI α4, α5 and α6 are three members that are identified lately, with a N-terminal region containing seven vWF-A modules, a collagen triple helical region, and a C-terminal region made of two or three vWF-A modules as well as specific sequence 14,26 . However, it is largely unknown how collagen type VI α4, α5 and α6 are implicated in tumor development. COL6A6 has been reported to be down-regulated in breast cancer [27][28][29] . Significant down-regulation of COL6A5 has been proved in esophageal squamous cell carcinoma 30 and fibrous epulis 31 . Wang and the colleagues has found that collagen type VI may interacts with ROBO2 in esophageal squamous cell carcinoma development, and ROBO2 serves as a tumor suppressor during malignant process 32 . Collagen type VI serves as an indispensable protein in extracellular matrix, and mainly deposited in lung fibrosis 33 . Pulmonary fibrosis, combined with emphysema, has been fully demonstrated to be strongly associated with the development of lung cancer 34,35 . However, the characteristics of COL6A5 gene in lung cancer remains incompletely understood. Bioinformatics analyses showed a decrease of COL6A5 level in lung tumor, and the low expression resulted in poor prognosis. www.nature.com/scientificreports www.nature.com/scientificreports/ We thus speculated that COL6A5 functions as a tumor suppressor gene in oncogenesis of lung. More convincing experimental evidence is needed to confirm these findings.
Single nucleotide substitutions in genes have been validated to be the risk factors in diverse diseases, which are proposed as predictors for individual risk assessment. Susceptible SNPs of COL6A5 have been identified and demonstrated in familial neuropathic chronic itch in previous research 36 . Eight variants located in COL6A5 gene have been demonstrated to be related to the susceptibility to atopic dermatitis among Europeans 37 . A longitudinal exome-wide association study performed among Japanese has showed that COL6A5 SNP rs11917356 was significantly correlated with hypertension risk through affect systolic blood pressure 38 . Furthermore, it has been reported that variations in COL6A5 might play causal roles for asthma together with environmental exposures 39,40 . But a recent study has not found any significant relationships between COL6A5 variants and asthma or chronic obstructive pulmonary disease in German 41 . In our study, COL6A5 polymorphisms rs13062453, rs1497305, and rs77123808 were genetic polymorphic markers that confer susceptibility to lung cancer among Chinese Han individuals. As all the significant SNPs were resided in the intron region of COL6A5 gene, which possesses regulatory functions in pre-mRNA processing 42 , we considered that alternation at these polymorphic sites might modulate the expression efficiency of COL6A5 mRNA, contributing to the abnormal pattern of COL6A5 and thereby influencing the individual susceptibility to lung cancer. Additionally, COL6A5 rs1497305 was found to be associated with TNM staging, which harbors the potential to become new target for lung cancer clinical stage evaluation in future.
There are several limitations in our research that should be acknowledged. First, all the cases and controls are recruited from one hospital, and the possible selection bias may exist. The random subject selection could help us reduce the selection bias to some degree. Second, the expression detection and prognosis analysis of COL6A5 gene were carried out by databases, which should be verified with well-designed experimental study. Third, the underlying mechanism of the promising SNPs and COL6A5 gene in lung cancer development is still unclear, thus suggesting the nest step for the further research.
In summary, this is the first research that discussed the relationships between COL6A5 polymorphisms and lung cancer risk. We uncovered significant variants that were associated with lung cancer susceptibility in Chinese Han population. These findings also provided supporting evidence for the involvement of COL6A5 gene in the occurrence of lung cancer.

Data availability
The data generated or analyzed during this study are available from the corresponding author on reasonable request.