Introduction

Cancer is typically characterized as a genomic disorder. It is thought that somatic mutations accumulate with age, some of which drive cancer development, and that germline mutations could explain predisposition to cancer development. Familial lung cancers usually express an autosomal dominant form, and as such, cancer susceptibility is passed down through the generations and the disease has a relatively young onset.1 Smoking is an environmental predisposing factor; exposure to tobacco smoke causes genetic alterations and is strongly associated with lung cancer.2, 3 So, genetic factors and environmental factors are closely linked with cancer. Familial cancer accounts for 15–20% of total cancers,1 examples of which include Li–Fraumeni syndrome (OMIM: 151623), hereditary retinoblastoma (OMIM: 180200), familial breast cancer (OMIM: 604370) and Lynch Syndrome (OMIM: 120435). Familial lung cancers, however, are less common.

Previous studies have identified clues as to the genetic factors in lung cancer. One of the most common causative genes is epidermal growth factor receptor (EGFR), which is a therapeutically targetable driver mutation in non-small cell lung cancer.4 Recently, driver mutations in Kirstine rat sarcoma viral oncogene homolog (KRAS), human epidermal growth factor receptor 2 (HER2) and the echinoderm microtubule-associated protein-like 4–anaplastic lymphoma receptor tyrosine kinase (EML4-ALK) fusion gene have been discovered.5, 6, 7 In addition to these crucial mutations, genome-wide association studies have revealed inherited susceptibility variants on chromosome 15q24-25.1,8, 9 6q23-25,10, 11, 12 and 12q24.10 In a familial lung cancer study, Liu et al.13 identified that a combination of single-nucleotide polymorphisms in chromosomal regions 5p15.33, 6p21.33, 6q23-25/RGS17 and 15q24-25.1 conferred susceptibility to familial lung cancer. Wang et al.14 suggested that heterozygous mutations in surfactant protein A2 were associated with lung cancer and pulmonary fibrosis in two pedigrees. However, many of the causative genes for familial lung cancer are yet to be identified.

Here, we attempted to identify a genetic factor in lung cancer by investigating a three-generation family with lung cancer susceptibility by whole-exome sequencing (WES). We identified 41 alterations in 40 genes linked to lung cancer development in the family. After somatic mutation screening in 192 sporadic lung cancers, we noted that ‘deleterious’ somatic mutations in CENPE or MAST1 were also present in multiple lung cancer samples. After considering the nonaffected family members and other branches of the family, we believe that MAST1 is most likely to be a familial lung cancer-related gene.

Materials and methods

Family

One family containing 17 members (Family N), 16 of whom were cognate and one who was a spouse, registered for this study. The 16 patients comprised nine men and seven women, with an average age of onset of 58 years. Pathological diagnoses of the 12 available cases, including double lung cancer cases, were 10 of adenocarcinoma and two of bronchiolo-alveolar carcinoma. Three of the 16 had multicentric lung cancer (III-13, IV-5, IV-12), and one (III-3) also had renal cell cancer. Three members also had other types of cancer (III-5: adrenal cancer, III-14: colon cancer, III-25: ovarian cancer). Individuals III-5 and III-25 died of adrenal cancer at 59 years, and ovarian cancer at 42 years, respectively. Individuals IV-4 and IV-5 had interstitial pneumonia as a respiratory complication. The age of unaffected control individuals (IV-13,14 and 15) was 58, 56 and 54 years, respectively. There were no consanguineous marriages and no history of exposure to asbestos. The family tree is shown in Figure 1 and clinical information is summarized in Table 1.

Figure 1
figure 1

Familial lung cancer pedigree. Seventeen members, 16 of whom were cognate and one who was a spouse, were diagnosed as lung cancer. Three out of the 16 had multicentric lung cancer (III-13, IV-5, IV-12), and one (III-3) also had renal cell cancer. Three members also had other types of cancer (III-5, adrenal cancer; III-14, colon cancer; and III-25: ovarian cancer). Individuals IV-4 and IV-5 had interstitial pneumonia as a respiratory complication.

Table 1 Summary of patient characteristics

Genomic analysis

Peripheral blood was collected from affected individuals III-4, III-6 and IV-12, and from three unaffected control individuals, IV-13, IV-14 and IV-15. Samples from sporadic lung cancer patients were obtained from specimens resected in the Division of Surgical Oncology, Nagasaki University Hospital between 2004 and 2013; control samples were selected from healthy inhabitants of Nagasaki, Japan. DNA was extracted using a QIAamp DNA Mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. A Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) was used to assess the concentration and purity of DNA. This study was conducted with the approval of the Genetic and Medical Ethics Commission at Nagasaki University and written consents were obtained from all participants.

Library preparation and whole-exome sequencing (WES)

We performed exon enrichment by hybridization capture using samples from individuals III-4, III-6, IV-12 and IV-14 (peripheral blood) with a SureSelect Human All Exon v4+UTR kit (Agilent, Santa Clara, CA, USA) following the manufacturer’s protocol. Sequence data were obtained using a SOLiD5500 (Invitrogen) by 75-bp forward and 50-bp reverse paired-end sequencing. Emulsion PCR for the SOLiD5500 was carried out following the manufacturer’s protocol but using KAPA HiFi Taq Polymerase (KAPA Biosystems, Wilmington, MA, USA). Read sequence data were aligned to the hg19 human reference genome using NovoalignCS (Novocraft Technologies Sdn Bhd, Petaling Jaya, Malaysia). NovoalignCS recalibrated the base-quality scores during the alignment. PCR and optical duplications were marked using Picard MarkDuplicates (http://picard.sourceforge.net/) and omitted from subsequent analyses. Reads near to insertions/deletions (INDELs) were locally realigned using the Genome Analysis Toolkit (GATK) IndelRealigner.15 Single-nucleotide variants (SNVs) and INDELs were detected with GATK’s UnifiedGenotyper according to the GATK Best Practice recommendations.16, 17 Detected SNVs and INDELs were annotated using ANNOVAR software (http://www.openbioinformatics.org/annovar/).18 We selected SNVs and INDELs as candidate variants if they satisfied the following criteria: (1) marked as PASS after GATK VariantFiltration using GATK’s recommended conditions; (2) alternative allele frequency <0.5% in these databases: (a) 69 Genomes Data from Complete Genomics (Mountain View, CA, USA); (b) National Heart, Lung and Blood Institute Grand Opportunity Exome Sequencing Project 6500 (https://esp.gs.washington.edu/drupal/); and (c) 1000 Genomes (http://www.1000genomes.org/); (3) no variation in our in-house data; and (4) not included within the table of segmental duplicated regions downloaded from the University of California Santa Cruz Genome Browser (2011-09-26 update). ‘Potentially deleterious mutations’ were defined as: (1) nonsynonymous change; or (2) change within 2-bp upstream of a splice acceptor site; or (3) change within 5-bp downstream of a splice donor site. Variants were annotated using following databases: RefSeq; ENSENBL gene; and GENCODE basic v12 and dbSNP135 downloaded from the University of California Santa Cruz Genome Browser at the beginning of this study. To report this study, we used the updated databases GENCODE_basic v19 and dbSNP138 to annotate variants. Sequence variants detected by WES were validated by capillary sequencing on a Genetic Analyzer 3130xl (Applied Biosystems, Foster City, CA, USA).

Target enrichment sequencing

DNA bait was generated for target resequencing using SureDesign (Agilent), then target enrichment of a DNA fragment of the 69 genes was performed using a SureSelect XT custom kit (Agilent) following the manufacturer’s protocol. Using a Hiseq2500 (Illumina, San Diego, CA, USA), we generated two 100-bp paired-end sequences, which were aligned to hg19 using Novoalign (Novocraft Technologies Sdn Bhd). The data were processed in the same way as were the WES data, with a slight modification. In addition to the ‘deleterious’ criteria for WES data, a judgment of ‘deleterious’ was only given if the variants had an alternative allele frequency<0.5% in Human Genetic Variation Database (www.genome.med.kyoto-u.ac.jp/SnpDB/).

Capillary sequencing

To validate the next-generation sequencing results, we designed 140 pairs of primers for the somatic mutation candidate sites using PrimerZ19 (http://genepipe.ngc.sinica.edu.tw/primerz/beginDesign.do) or Primer3Plus20 (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) (Supplementary Table 1). For PCR, 10 ng of genomic DNA was amplified in a 10-μl volume using the following conditions in a T1 thermocycler (Biometra, Göttingen, Germany): 94 °C for 2 min, followed by 35 cycles of 94 °C for 10 s, 60 or 65 °C for 20 s and 68 or 72 °C for 30 s, followed by a final cycle of 68 or 72 °C for 5 min. The reactions were performed with ExTaq HS (Takara Bio, Shiga, Japan) or KOD FX (Toyobo, Osaka, Japan). Samples were sequenced using a BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems) and separated on a Genetic Analyzer 3130xl (Applied Biosystems). Sequence electropherograms were aligned using ATGC software (Genetyx Corporation, Tokyo, Japan).

Results

Whole-exome sequencing with peripheral blood DNA and tumor DNA from the family

The family tree and clinical characteristics of the family (Family N) are shown in Figure 1 and Table 1, respectively.

The results of WES, targeting ~50 Mbp using a SureSelect Human All Exon v4+UTRs kit are shown in Table 2. The raw data filtering process is summarized in Tables 3A and 3B. Seventy-one alterations were found in all three affected individuals but not in the unaffected individual IV-14 in DNA from blood. The candidate genes, variations and loci are summarized in Table 4. The 71 alterations comprised 69 SNVs and two short deletions in 69 genes; we annotated them as ‘potentially deleterious’ and considered them to be candidate mutations/genes for familial lung cancer development. All 71 alterations were confirmed by direct sequencing on a capillary sequencer and were heterozygous. After comparing the 71 variants with the updated GENCODE_basic v19 instead of v12, we selected 41 variants in 40 genes (40 SNVs and one INDEL), as our candidate variants (Table 4) and 30 variants were excluded because those are not on the list of defined genes in GENCODE_basic v19. Two putative deleterious variants were found in NOTCH1. One of these genes, MET, has been previously associated with several kinds of cancer21, 22; 39 genes were newly identified as candidates for cancer susceptibility genes.

Table 2 WES data using SureSelect V4 UTRs
Table 3a Process of SNV filtering from raw data
Table 3b Process of INDEL filtering from raw data
Table 4 Variants found only in patients III-4, III-6 and IV-12

Direct sequencing in unaffected individuals

We next checked for the presence of these variants by direct sequencing of DNA from the peripheral blood of unaffected individuals IV-13, IV-14 and IV-15. We found that 22 out of 41 variants were not present in individuals IV-13 or IV-15 and were therefore completely linked to lung cancer development in Family N (Table 4). Assuming complete penetrance among the six individuals in this study, many of these 22 alterations were located on 17p13, 19p13 and 19q13, so these loci could be defined as regions linked to lung cancer in Family N. Consider base sequence and map information together from exome analyses in the family, one of the genes including CLUH, TRPV3 and P2RX5 on 17p13; MAST1 and CD97 on 19p13; and PPP5C and EMC10 on 19q13 is most likely the causative candidate gene for this family. However, other genes cannot be excluded from candidates by exome sequence and variant map information.

Target enrichment sequencing of 192 sporadic lung cancers and 192 control samples

We considered that one of the alterations in the 40 genes would act as a driver mutation in the development of lung cancer. Because we expected that somatic mutations would accumulate in one of the 40 genes, we performed exon target enrichment sequencing in 192 sporadic lung cancer patients and 192 healthy individuals. The breakdown of the pathological diagnosis in the 192 lung cancers was as follows: 117 (60.9%) adenocarcinoma; 48 (25%) squamous cell carcinoma; 10 (5.2%) large cell carcinoma; 8 (4.2%) small cell carcinoma; and 9 (4.6%) other carcinoma.

We used a custom SureSelect target enrichment system to extract mutations found only in the sporadic cancer patients. We use the word ‘inherited’ variants to mean those that came from the zygote; thus, ‘inherited variants’ means germline variants or nonsomatic mutations. We identified 69 alterations in the 40 candidate genes in 192 sporadic lung cancers and considered them to be ‘deleterious mutations’, as detailed in the Materials and Methods.

Twenty-eight changes out of the 69 were confirmed to be somatic mutations by comparing them with sequenced DNA from corresponding normal tissue. All somatic mutations were heterozygous and were not observed recurrently (Table 5). Among the 28 somatic mutations, one was nonsense mutation (CACNB2; c.C1380A:p.Y460X) and 27 were nonsynonymous (Table 5). Mutations found in nontumorous regions (considered to be inherited variants) are listed in Table 5. Genes in which two or more somatic mutations were found were: five somatic mutations in CENPE; three in LCT, ATG2A and MAST1; and two in PCDH10, MET, CACNB2 and SYMPK. During validation by capillary sequencing, we noted that the wild-type allele for GRN in Sample 127 and that for ATG2A in sample 151 were detected in a mosaic state due to loss of heterozygosity, because the peak height of the wild-type allele was very low.

Table 5 Variants found in DNA from 192 sporadic lung cancer samples

Discussion

In this study, 71 variants were annotated as ‘deleterious’ by the first screening. After filtering against GENCODE_basic v19, we ultimately selected 41 ‘inherited’ variants as candidates causing familial lung cancer in Family N. All 41 variants were changes occurring heterozygously in all three patients; thus, it is conceivable that loss or gain of function due to any one of the alterations induces lung cancer. Some of the variants have an ‘rs number’ in DbSNP138 and/or are present in the Japanese population according to the Human Genetic Variation Database. It is less likely that these variants cause lung cancer in this pedigree.

If our 41 variants include a causative mutation for lung cancer, we expected that somatic or inherited mutations would be identified in one of these candidate genes in sporadic lung cancer. To this end, we performed exon target enrichment sequencing. For inherited mutations, we could not conclude which gene is responsible for lung cancer, because we identified many germline alterations but no particular gene showed many mutations. In contrast, for somatic mutations, we identified 28 in 40 candidate genes. More than two somatic mutations were detected in eight genes: LCT, CENPE, PCDH10, MET, CACNB2, ATG2A, MAST1 and SYMPK; these genes may be generally related to cancer development. In particular, CACNB2 is a very good candidate because an A-to-G variant found in affected family members (chr10: 18690944) was not present in any variant database, and because this gene harbored two somatic mutations including a stop-gain mutation. Similarly, five somatic mutations were found in CENPE. However, one CENPE variant found in a member of Family N, chr4:104059558 C>T, is also present, albeit rarely, in the Japanese population (alternative allele frequency=0.00271 in Human Genetic Variation Database) (Table 4). LCT, ATG2A and MAST1 each harbored one somatic mutation that was not present in databases of normal variation. In addition, none of our variants was present in the Sanger COSMIC lung cancer database (http://www.sanger.ac.uk/genetics/CGP/cosmic/). The somatic mutational frequency of CACNB2, CENPE, LCT, ATG2A and MAST1 in sporadic lung cancer patients was 1.0% (2/192 samples), 2.6% (5/192), 1.6% (3/192), 1.6% (3/192) and 1.6% (3/192), respectively. In a previous study, mutations in the well-known driver genes, HER2, BRAF, PIK3CA, AKT1, MAP2K1 and MET, accounted for <5% of mutations,23 so it is not surprising that the somatic mutation rate is low among just these five genes.

Considering the genotype of unaffected family members, candidate mutations were located on chromosomes 17p13, 19p13 and 19q13. Although these regions have not been previously implicated in lung cancer by genome-wide association studies,3, 24, 25, 26 they could be linked to lung cancer in Family N. Within these three regions, MAST1 is the most obvious candidate gene. Chromosome 12q24 has been previously linked to lung cancer by genome-wide association studies;10 however, we did not identify any variants in this region in Family N.

Regarding inherited variants found in sporadic lung cancers and healthy controls, five candidate genes —CACNB2, CENPE, LCT, ATG2A and MAST1—merit consideration. This is because the count of rare variants indicates specificity for lung cancer. We counted the inherited variants (that were not present in databases of normal variation) in sporadic lung cancer cases and healthy controls (Table 5 and Supplementary Table 2). We found 2 and 4 variants in CENPE; 0 and 2 in CACNB2; 1 and 0 in LCT; 1 and 4 in ATG2A; and 1 and 0 in MAST1, respectively. Variants in MAST1 and LCT are probably very rare in healthy control populations, so somatic mutation in lung cancer patients and variants in Family N might be significant for lung cancer development.

To sum up, it is most likely that the MAST1, c.G3224T: p.R1075L mutation is causative for the familial lung cancer in this study, with CENPE, CACNB2 and LCT as second-place candidates. All have reported functions concerning tumor development. MAST1, microtubule-associated serine-threonine kinase 1, and in particular its PDZ domain, stabilizes and modulates phosphorylation of the C-terminal phospholipid-binding C2 domain of PTEN.27 PTEN, a tumor suppressor gene, is connected with cancer development by regulating cell growth and apoptosis.27 In addition, fusion genes involving MAST1ZNF700–MAST1, NFIX–MAST1 and TADA2A–MAST1—were identified in breast cancer cell lines and tumor samples by transcriptome sequencing, and overexpression of these MAST1 fusion genes had a proliferative effect both in vitro and in vivo.28 It is possible that the MAST1 mutation in Family N influences signal transduction involving PTEN regulation or increases MAST1 activity, leading to cancer development. We examined immunohistochemical data to know the expression level of MAST1 using formalin-fixed paraffin-embedded samples of normal lung in an affected and unaffected person. There were no significant differences in the expression after immunohistochemical examination.

CENPE, centromere-associated protein-E, is a member of the kinesin family that is a key receptor at the mitotic checkpoint. Inhibition of CENPE has a tumor-suppressive effect, such as tumor cell apoptosis or regression.29 The mutation site in Family N is within a long flexible alpha-helical coiled-coil region (residues D336–A2471), with the mutation at a site that links two domains, an ATP-binding/microtubule-interacting region and a kinetochore-binding domain.30 The mutation in Family N, c.G6253A:p.G2085R, may influence the domain and linker structure of CENPE.

LCT, lactase, has also been linked with cancer. A polymorphism in LCT influences calcium metabolism in colorectal cancer and is correlated with progression and/or incidence of colorectal cancer.31 There have not yet been any reports linking CACNB2, calcium channel voltage-dependent beta-2 subunit, with cancer.

We conclude that MAST1 is possibly a causative gene for familial lung cancer. Further genomic studies of sporadic cases and/or familial cases, and functional assays for mutations in MAST1, CENPE and LCT, are necessary to confirm our findings and reveal a novel gene related to lung cancer development.