Precision oncology of lung cancer: genetic and genomic differences in Chinese population

Knowledge of the lung cancer genome has experienced rapid growth over the past decade. Genome-wide association studies and sequencing studies have identified dozens of genetic variants and somatic mutations implicated in the development of lung cancer in both Chinese and Caucasian populations. With the accumulating evidence, heterogeneities in lung cancer susceptibility were observed in different ethnicities. In this review, the progress on germline-based genetic variants and somatic-based genomic mutations associated with lung cancer and the differences between Chinese and Caucasian populations were systematically summarized. In the analysis of the genetic predisposition to lung cancer, 6 susceptibility loci were shared by Chinese and Caucasian populations (3q28, 5p15, 6p21, 9p21.3, 12q13.13 and 15q25), 14 loci were specific to the Chinese population (1p36.32, 5q31.1, 5q32, 6p21.1, 6q22.2, 6p21.32, 7p15.3, 10p14, 10q25.2, 12q23.1, 13q22, 17q24.3, 20q13.2, and 22q12), and 12 loci were specific to the Caucasian population (1p31.1, 2q32.1, 6q27, 8p21.1, 8p12, 10q24.3, 11q23.3, 12p13.33, 13q13.1, 15q21.1, 20q13.33 and 22q12.1). In the analysis of genomic and somatic alterations, different mutation rates were observed for EGFR (Chinese: 39–59% vs. TCGA: 14%), KRAS (Chinese: 7–11% vs. TCGA: 31%), TP53 (Chinese: 44% vs. TCGA: 53%), CDKN2A (Chinese: 22% vs. TCGA: 15%), NFE2L2 (Chinese: 28% vs. TCGA: 17%), STK11 (Chinese: 4–7% vs. TCGA: 16%), KEAP1 (Chinese: 3–5% vs. TCGA: 18%), and NF1 (Chinese: <2% vs. TCGA: 12%). In addition, frequently amplified regions encompassing genes involved in cytoskeletal organization or focal adhesion were identified only in Chinese patients. These results provide a comprehensive description of the genetic and genomic differences in lung cancer susceptibility between Chinese and Caucasian populations and may contribute to the development of precision medicine for lung cancer treatment and prevention.


INTRODUCTION
Lung cancer has been the most frequently diagnosed cancer and the leading cause of cancer death worldwide for several decades. 1 In China, it was estimated that there were 0.73 million new cases and 0.61 million deaths from lung cancer in 2015, accounting for more than one-third of the world total. 2 Although tobacco smoking has been recognized as the most important risk factor associated with lung cancer, familial aggregations have been observed in lung cancer patients even among nonsmokers, suggesting an important role of heredity in the development of lung cancer. 3,4 The heritability of lung cancer was recently estimated to be 15.2% in a Chinese population. 5 Although heredity has long been implicated in the development of lung cancer, the germline-based genetic variants and somatic-based genomic mutations associated with lung cancer have just begun to be revealed in recent decades. 6 With the accumulation of these findings, genetic and genomic heterogeneity associated with different histopathological types of lung cancer, as well as among different ethnic groups, was observed. The most representative finding is the L858R mutation in the EGFR, which was more frequently detected in Asian patients with adenocarcinoma histology. 7 Such heterogeneities are widespread in lung cancer, but a systematic summary of the results has been lacking until now. Here, the main findings of genetic and genomic studies of lung cancer and the differences associated with lung cancer in Chinese and Caucasian patients were reviewed.
The success of the International HapMap Project provided a foundation for genomic epidemiological studies. 8 After its completion, researchers began to identify susceptibility genes and genetic variations associated with lung cancer based on association studies. Variants in candidate susceptibility genes, such as genes involved in DNA damage and repair pathways, the cell cycle pathway, and the inflammatory pathway, as well as genes involved in the detoxification or metabolic activation of carcinogens and tobacco smoke, have been widely studied in Chinese populations. [9][10][11][12] However, many of these associations are controversial and show low verification rates in different studies.

Findings from lung cancer GWAS in Chinese populations
With the development and application of high-throughput genotyping technologies, genome-wide association studies (GWAS) have proven to be an efficient way to explore the genetic predisposition of complex traits. 13 In 2011, Hu, Z. et al. conducted the first GWAS of lung cancer in a Chinese population, a largescale multistage case-control study. 14 A total of 906,703 singlenucleotide polymorphisms were genotyped in 2,383 lung cancer cases and 3,160 controls followed by a fast-track validation using 6,313 cases and 6,409 controls 14 and a second round validation in an enlarged sample of 7,436 cases and 7,483 controls. 15 Together, these two studies identified seven loci at the genome-wide significance level (P < 5.0 × 10 −8 ); they included rs4488809 at 3q28 (TP63), rs2736100 and rs465498 at 5p15 (TERT-CLPTM1L), rs2895680 at 5q32 (STK32A-DPYSL3), rs1663689 at 10p14 (GATA3), rs753955 at 13q22 (MIPEP-TNFRSF19), rs4809957 at 20q13.2 (CYP24A1), and rs17728461 and rs36600 at 22q12 (MTMR3-HORMAD2). In addition, rs9439519 at 1p36.32 (AJAP1-NPHP4) and rs247008 at 5q31.1 (IL3-CSF2) were consistently associated with the risk of lung cancer at different stages. Of the identified variants, rs2736100, rs2895680, rs4809957, rs247008, and rs9439519 showed evidence of interaction with smoking dose.
Adenocarcinoma (AC) and squamous cell carcinoma (SqCC) are two major histological subtypes of lung cancer. 16 Obvious heterogeneity has been observed among different histological subtypes of lung cancer; for example, 3q28 (TP63) and 5p15 (TERT-CLPTM1L) were more prominent among AC, 17,18 and 12q23.1 (SLC17A8-NR1H4) was significantly associated only with SqCC. These findings represent a considerable advance in the understanding of lung cancer susceptibility in the Chinese population. 19 Exploring the missing heritability of lung cancer in the Chinese population Although lung cancer GWAS in Chinese populations have identified several robust susceptibility loci, only a fraction of the heritability of lung cancer susceptibility can be explained by these variants. 5 To explore the "missing heritability", a series of explorations was performed that included studies of the pleiotropy of susceptibility genes in multiple types of cancer, 20 genome-wide meta-analysis in never-smoking women, 21,22 and the association of low-frequency variants with lung cancer occurrence. 23 Evidence for the pleiotropy of genes/loci had been obtained in previous GWAS 24,25 and is biologically plausible. Jin G et al. mapped genetic variants that have consistent effects on risk of multiple cancers using genome-wide scan data on lung cancer, noncardia gastric cancer, and esophageal squamous cell carcinoma obtained from 5,368 cases and 4,006 controls followed by an evaluation in an additional 9,001 cases and 11,436 controls. 20 Consistent associations of rs2494938 at 6p21.1 (LRFN2) and rs2285947 at 7p15.3 (SP4) were observed with these three cancers. The minor alleles of rs2494938 and rs2285947 were significantly associated with an increased risk of lung cancer in this Chinese population.
In China and throughout Asia, most women are nonsmokers. 26 However, an obvious increase of lung cancer incidence (especially that of AC) was observed in never-smoker women recently. In a GWAS conducted by the Female Lung Cancer Consortium in Asia (FLCCA), 66,09 never-smoking female lung cancer cases and 7,457 controls drawn from 14 studies were analyzed (4,839 cases and 5,050 controls were from China). 21 In addition to confirming the associations reported for loci at 3q28, 5p15.33, and 17q24.3, the study also identified three new susceptibility loci at 6p21.32 (rs2395185, HLA Class II region), 6q22.2 (rs9387478, ROS1-DCBLD1), and 10q25.2 (rs7086803, VTI1A). A subsequent meta-analysis with an extended sample size of the FLCCA reported another three novel loci, including 6p21.1 (rs7741164, DQ14194), 9p21.3 (rs72658409, CDKN2B), and 12q13.13 (rs11610143, ACVR1B). 22 GWAS usually focus on common genetic variants (variants with minor allele frequency (MAF) ≥ 0.05) while neglecting lowfrequency or rare variants (MAF < 0.05). 27 The Illumina Human Exome Beadchip, which mainly focuses on low-frequency and rare missense variants in exon regions, was recently developed. 28 Using this chip, Jin, G. et al. scanned 1,348 lung cancer cases and 1,998 control subjects and subsequently evaluated promising associations in an additional 4,699 affected subjects and 4,915 control subjects. 23 Three low-frequency missense variants, BAT2 Val284Met]), were observed associated with lung cancer risk. Rs9469031 in BAT2 and rs6141383 in BPIFB1 were also associated with the age of onset of lung cancer (P = 0.001 and 0.006, respectively). These findings extend the genetic associations of GWAS and help explain the additional heritability of lung cancer in the Chinese population.
Similarities and differences in lung cancer susceptibility between Chinese and Caucasian populations In 2008, three independent studies simultaneously reported chromosome 15q25.1 as a susceptibility region for lung cancer in Caucasian populations; in this region, rs8034191 and rs1051730 were the major tagging variants. [29][30][31] The region harbors six protein-coding genes, including three genes, CHRNA3, CHRNB4, and CHRNA5, that encode nicotinic acetylcholine receptors. A subsequent study found that rs1051730 near CHRNA3 was strongly associated with smoking quantity and nicotine addiction as well as with lung cancer risk. 30 With an enlarged sample size and pooling analysis, rs402710 and rs2736100 at 5p15.33 (CLPTM1L-TERT) were shown to be independently associated with lung cancer risk (r 2 = 0.026). [32][33][34] TERT encodes telomerase reverse transcriptase, which plays an important role in maintaining the length and stability of human telomeres. CLPTM1L was overexpressed in non-small cell lung cancer and protected tumor cells from genotoxic stress-induced apoptosis. 35 RNA interferencemediated blockade of CLPTM1L could inhibit K-Ras-induced lung tumorigenesis. 36 Using a higher-density chip, Wang, et al. validated the associations at 5p15.33 and 15q25.1 and identified a novel locus at 6p21.33 (rs3117582, BAT3-MSH5). 37 In 2010, a meta-analysis combining 16 GWAS confirmed the susceptibility loci at 15q25.1, 5p15.33, and 6p21.33. Stratification analysis identified three novel loci for SqCC, including 2q32.1 (rs11683501, NUP35), 9p21.3 (rs1333040, CDKN2A-CDKN2B), and 12p13.33 (rs10849605, RAD52). 38 In 2014, another GWAS metaanalysis tested the associations of less frequent variants using imputation based on the 1000 Genomes Project. In this study, two rare variants of BRCA2 (rs11571833, p.Lys3326X) and CHEK2 (rs17879961, p.Ile157Thr) were identified with large-effect genome-wide associations for SqCC. 39 In 2017, the OncoArray Consortium genotyped 14,803 additional lung cancer cases and 12,262 additional controls using the OncoArray in Caucasian populations. 40 After integration with the existing GWAS data, 29,266 cases and 56,450 controls were used in an association analysis. This study highlighted the genetic heterogeneity across histological subtypes of lung cancer and reported novel loci for lung cancer at 1p31.1 (rs71658797, FUBP1), 6q27 (rs6920364, RNASET2), 8p21.1 (rs11780471, CHRNA2), and 15q21.1 (rs66759488, SEMA6D). Stratification by histology identified five novel loci for Comparing the findings of GWAS for different ethnic groups, genetic susceptibility to lung cancer showed both similarities and differences in Chinese and Caucasian populations (Table 1). 41 The identified regions 3q28, 5p15, 6p21, and 9p21.3 are shared by the two ethnic populations. However, although 6p21 was a shared susceptibility region, rs3117582 was nonpolymorphic in the Chinese population, whereas the low-frequency missense variants rs9469031 and rs200847762 were associated with lung cancer risk in Chinese. 23 In a two-stage study of never smokers exposed to second-hand smoke, an intronic SNP, rs12809597, in the ACVR1B gene (12q13.13) was also found to be associated with the risk of lung cancer in Caucasians, suggesting that 12q13.13 is a shared susceptibility locus. 42 In the fine-mapping analysis of 15q25, which was associated with smoking quantity and lung cancer in Differences in susceptibility to adenocarcinoma and squamous cell carcinoma Although most lung cancers occur due to smoking, the corresponding attributable fraction varies greatly by histological type; some types of lung cancer, such as SqCC, are thought to be almost exclusively due to smoking, whereas other types, such as AC, are thought to be less dependent on smoking. With decreased smoking prevalence and changes in tobacco composition, the distribution of histological types of lung cancer has undergone tremendous changes since the 1950s, and AC has overtaken SqCC as the most common type (40-50% vs. 20-30%) in Western countries. 44 In China, over 70% of men were regular smokers at one point in their lives, whereas only 3% of women report having smoked regularly. 45 The tobacco-attributed risk for the Chinese smokers was much less extreme than the risk for their counterparts in the West; for example, only a 2-4-fold increased risk of lung cancer was observed in the Chinese population versus the 20-30-fold increased risk observed among smokers in the West. 46 The proportion of AC also increased from 25.93% (1995-1997) to 56.36% (2013-2015), exceeding the proportion of SqCC (the proportion of which decreased from 49.1 to 26.34%) and becoming the main histological type in the Chinese. Notwithstanding the existence of several shared susceptibility loci between AC and SqCC (such as 8p21.1, 15q25, and 19q13) in the Caucasian population, there were significant differences in genetic susceptibility to AC and SqCC. Genetic loci associated with telomere length (such as 5p15, 10q24, and 20q13.33), somatic alterations (such as 8p12 and 9p21.3), or inflammation (15q21) were more evident in AC. In contrast, key players in the DNA damage response (such as 12p13.33-RAD52, 13q13.1-BRCA2, and 22q12.1-CHEK2) were more likely to be associated with susceptibility to SqCC, presumably because the products of these genes are needed to repair ongoing DNA damage that occurs on exposure to tobacco smoke. 40 In the Chinese population, most of the identified susceptibility loci showed no histopathological differences, probably due to the relatively small sample size. However, evidence for negative interactions with smoking dose (such as 5q32) were more significant in AC, while positive interactions with smoking dose (such as 1p36.32) were more evident in SqCC. 13 Risk prediction based on the identified variants in a Chinese population The application value of the identified variants for risk assessment of lung cancer was evaluated in recent study. GWAS dataset for lung cancer in a Chinese Han population were divided into a training set (Nanjing and Shanghai: 1473 cases, 1962 controls) and a testing set (Beijing and Wuhan: 858 cases, 1115 controls). A total of 14 variants with independent effects were used to build a genetic risk score (GRS). In the training set, the area under the curve (AUC) increased from 0.65 (0.63-0.66) to 0.69 (0.67-0.71) when the GRS were included in the risk prediction model; in the testing set, a similar improvement from 0.61 to 0.65 was observed. 47 This result indicated that integrating traditional epidemiological risk factors through GRS was helpful for building the risk prediction model, a finding that may prove to be of great value in the identification of populations at high risk of lung cancer.
The overall burden of somatic alterations in the genomes of Chinese lung cancer patients Another hotspot of genomic studies on lung cancer has focused on somatic alternations (e.g., mutations, CNVs, and chromosome rearrangements). With the emergence of massively parallel sequencing technology, the genomic landscapes of cancer in European and American populations have been gradually deciphered by large cooperative projects such as The Cancer Genome Atlas 48 (TCGA, launched in 2005) and The International Cancer Genome Consortium 49 (ICGC, launched in 2009). In these studies, lung cancer is one of the few cancers with a high mutational burden because lung cancer patients commonly have a history of exposure to cigarette carcinogens. 50 Recent wholeexome sequencing of specimens of malignant tissue from American lung cancer patients has revealed a median somatic mutation rate of 8.7 mutations/Mb and 9.7 mutations/Mb for AC and SqCC, respectively. 51 These mutations were characterized by high proportions of cytosine-adenine (C to A) nucleotide transversions (the substitution of a purine for a pyrimidine or vice versa), which are seen predominantly in lung AC of smokers or ex-smokers rather than in lung AC of nonsmokers. [51][52][53] In East Asians, the mutation rate of lung SqCC was similar to the rate observed in Americans (a mean mutation rate of 8.71 mutations/Mb). 54 However, two Chinese studies on lung AC reported distinct mutation burdens. Wu, et al. reported a mutation rate of 9.7 mutations/Mb in 101 primary lung AC patients, 55 while Li et al. observed a much lower mutation rate (1.4 mutations/Mb) in 271 lung AC patients. 56 A study of whole-genome sequencing of NSCLC was conducted recently in the Nanjing Lung Cancer Cohort, providing a more accurate estimation of the mutation rate for the whole genome landscape. 57 In this study, AC patients had a much lower mutation rate than SqCC patients (2.2 mutations/Mb for AC, 13.6 mutations/Mb for SqCC). 57 Aside from possible technical reasons for the differences (differences in study design, sequencing methods, calling protocols, etc.), it is worth noting the  56 The lung cancer rates in Chinese women are higher than those in women in some European countries despite an extremely low prevalence of smoking. 58 Second-hand smoke seems unlikely to be responsible for the female AC patients in the Chinese population from the point of view of genomics because these patients carry a low burden of smoking-induced mutations. 59 Thus, the etiology of lung cancer in female patients without smoking behavior or elevated mutation rates remains largely unknown and warrants further investigation in future studies.
In addition to mutations, the large number of focal and broad areas that display somatic copy number alterations and genomic rearrangements illustrates the instability and complexity of the lung cancer genome. [51][52][53][54][55]60 Intratumor heterogeneity can be inferred from chromosome instability and has been proven to be a prognostic predictor of lung cancer. 60 Unfortunately, no similar study has been conducted in Chinese patients.
The great genomic heterogeneity between AD and SCC cells and genomics-driven precise lung cancer therapy for Chinese patients in the future Considering the genome landscape of cancer cells mentioned above, the obvious heterogeneity between the AD and SCC genomes needed to be emphasized. AC could be described as a "gain-of-function" cancer, as most AC patients carry "gain-offunction" mutations or structural alterations in oncogenes involving the receptor tyrosine kinase pathway, such as L858R mutations in EGFR and ALK-EML4 fusion genes. In most cases, these alterations have occurred recurrently, and the highly activated genes are ideal targets for molecular therapy. SCC patients, in contrast, are commonly affected by a great number of "loss-of-function" mutations in well-known tumor suppressor genes, such as frameshift or nonsense mutations in TP53, PTEN, and CKDN2A. The deficiency of tumor suppressor genes can also be attributed to the higher mutation rate in SCC. Smoking behavior may be one of the reasons for tumor suppressor loss in SCC patients. The heterogeneity of cancer genomics provides new insights into methods of treatment.
In the past decade, substantial progress has been made in the precise treatment of lung cancer, and this progress has benefited from the deep investigation of cancer genomics. 62 Specific targeted agents against mutated EGFR and rearranged ALK have been successfully applied and have gradually become the standard first-line therapy for lung cancer. 63 Due to the high mutation rate of EGFR, Chinese lung AC patients could benefit most from treatment with EGFR tyrosine kinase inhibitors (EGFR-TKI). However, the position of the mutation can dramatically influence the treatment outcome. 64 For example, the benefit for tumors with exon 19 deletions was 50% greater than that for tumors with the L858R mutation, although L858R is the most prevalent mutation in Chinese patients. 64 One possible reason for this could be the acquisition of resistance to first/second generation EGFR-TKI (e.g., Gefitinib, Afatinib) by tumor cells that carry the gain-of-function T790M mutation. 65 Very recently, the third-generation EGFR-TKI Osimertinib, which targets T790M, was found to be effective against resistant NSCLC, 65 bringing a new perspective to TKI treatment of lung cancer. However, most of these studies were conducted in Western populations; thus, a careful evaluation in Chinese patients is warranted. Another reason for EGFR-TKI resistance may be intratumor heterogeneity, since distinct subclones of tumor cells may carry different mutations. Thus, a comprehensive landscape of tumor heterogeneity in Chinese patients is necessary for further evaluation. More importantly, the genomic catalog of lung cancer is far from complete. The complexity of lung cancer poses a great challenge to investigators who are attempting to discover alterations that occur at low frequency (<10%) and to compare these alterations in Chinese and Western populations. The identification of cytoskeleton remodeling gene amplification 55 is a good start and provides potential Chinese-specific targets for future therapy.
Another important process that has contributed to advances in treatment is the use of monoclonal antibodies (mAbs) and adoptive cellular therapy to treat cancer by modulating the immune response. 66 Blockade of programmed cell death protein 1 (PD-1) has now been approved by the FDA for treatment of patients with NSCLC. 66 One emerging biomarker of response to anti-PD-1 therapy is the tumor mutational burden. 67 Thus, an effect of anti-PD-1 therapy in Chinese smoker patients is expected. Chinese female and never-smoker patients have a specific pattern of inflammatory lymphocyte infiltration that is mainly characterized by elevated numbers of B cells, 57 indicating that advanced therapy that affects the inflammatory microenvironment in these patients should be emphasized and that it warrants future investigation.  Fig. 1 Mutation patterns of potential driver genes in the NJLCC and TCGA datasets. The data used to create this figure were taken from the study of Wang, C. et al. 57 and the TCGA project 51,52 Interaction between germline and somatic changes in lung cancer Emerging evidence indicates that the somatic evolution of a tumor may be significantly affected by inherited polymorphisms that are carried in the germline. By analyzing genomic data for 5954 samples from TCGA, Hannah et al. found 412 genetic interactions between germline polymorphisms and major somatic events across 22 cancer types. 68 In recent analysis, lung cancer driver genes are observed more likely to be located within cancer susceptibility regions. The susceptibility variant rs36600 was associated with somatic mutations within ARID1A, the susceptibility variants rs2395185 and rs3817963 were associated with somatic alterations in the cell cycle pathway, and rs3817963 was associated with somatic alterations in the MAPK signaling pathway. 69 These findings highlight the important role of germline-somatic interactions in tumorigenesis in lung cancer and help uncover the potential molecular mechanisms that underlie the GWAS findings. However, the majority of the samples used in the analysis were European; the results for the Chinese population have yet to be determined due to the lack of appropriate data.
OUTLOOK Important progress had been made during the past decade in understanding host susceptibility and genomic alterations associated with lung cancer in the Chinese population. However, the way to recognize lung cancer is just starting at a new point. Missing heritability is still widespread in both Chinese and Caucasians. 70 This indicates that further effort is needed to identify additional susceptibility loci for lung cancer. Compared to studies conducted in Caucasians, the sample sizes of GWAS in Chinese populations are relatively small. 41 Additional 10 thousand pairs of lung cancer cases and matched controls are being genotyped in Chinese population in an effort to further characterize the genetic susceptibility to lung cancer. In addition, with the decreased cost of sequencing, it will soon be feasible to sequence whole-genome variants for thousands of samples simultaneously. The results of genome-wide gene-environment interaction studies will also help identify the missing heritability. With the initiation of large cohort studies in China, additional findings are foreseen in the near future.
The next important step is identification of the causal genetic variants and genes in the GWAS-reported loci and determination of the exact mechanisms of action of these variants through functional annotation or laboratory functional experiments. The results of the ENCODE Project and Roadmap Epigenomics provided a variety of annotation information (such as histone modifications and DNase hypersensitive sites) that is valuable for marking functional genomic segments in risk loci. Expression quantitative trait loci in a variety of tissues, such as the genotypetissue expression, 71 have proven to be a valuable resource for the functional study of GWAS loci. The TCGA project provided much more dimensional data, including gene expression, genotype, epigenetics, and somatic alterations as well as the proteome, that constitute a precious resource for research on the biological mechanism of lung cancer. 52,53 With these public resources, several candidate susceptibility genes, including RNASET2 in 6q27, NRG1 in 8p12, AMICA1 in 11q23.3, and SECISBP2L in 15q21.1, have been identified through functional annotation. However, most of the samples used in these resources were obtained from Caucasians, and the mechanisms of action of the specific susceptibility loci of lung cancer in the Chinese population are still unclear. With the accumulation of multiomics data on lung cancer in the Chinese population, more meaningful findings are foreseen in the near future. Therefore, findings from genetic associations and somatic alterations are not only important in elucidating the etiology of lung cancer but are also critical to the clinical targeted therapy of lung cancer. This genetic knowledge will provide an important foundation for the development and improvement of future clinical strategies for lung cancer precision oncology.

DATA AVAILABILITY
The data used to create Fig. 1 were taken from the study of Wang C et al. 57 and the TCGA project. 51,52