Abstract
Although lung cancer risk among smokers is dependent on smoking dose, it remains unknown if this increased risk reflects an increased rate of somatic mutation accumulation in normal lung cells. Here, we applied single-cell whole-genome sequencing of proximal bronchial basal cells from 33 participants aged between 11 and 86 years with smoking histories varying from never-smoking to 116 pack-years. We found an increase in the frequency of single-nucleotide variants and small insertions and deletions with chronological age in never-smokers, with mutation frequencies significantly elevated among smokers. When plotted against smoking pack-years, mutations followed the linear increase in cancer risk until about 23 pack-years, after which no further increase in mutation frequency was observed, pointing toward individual selection for mutation avoidance. Known lung cancer-defined mutation signatures tracked with both age and smoking. No significant enrichment for somatic mutations in lung cancer driver genes was observed.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
WGS data are available at dbGap (accession number: phs002758.v1.p1) and can be accessed at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002758.v1.p1. Somatic mutation calls, including single-base substitutions and indels from all 134 samples, have been deposited to SomaMutDB at http://vijglab.einsteinmed.org/static/vcf/lung_Huang.et.al.Naturegenetics.tar.gz.
Code availability
Sequencing reads were filtered to remove adapter and low-quality reads by Trim Galore (version 0.4.1), mapped to the human reference genome (GRCh37, including decoy contigs) using BWA (mem; version 0.7.10), with PCR duplication removed by Samtools (version 0.1.19) Realignment of reads and recalibrations of base quality scores were performed by GATK (version 3.5.0). Somatic mutations were called using SCcaller (version 1.2; https://github.com/biosinodx/SCcaller). MASS (version 7.3-53) and lme4 (1.1-26) was employed for statistical analysis in R (4.0.3 GUI 1,73 Catalina build 7892). Custom codes for statistical analysis, permutation analysis, are available through GitHub (https://github.com/Zhenqiu85/Lung_Smoke_analysis).
References
Flanders, W. D. et al. Lung cancer mortality in relation to age, duration of smoking, and daily cigarette consumption. Cancer Res. 63, 6556–6562 (2003).
Thurston, S. W., Liu, G., Miller, D. P. & Christiani, D. C. Modeling lung cancer risk in case-control studies using a new dose metric of smoking. Cancer Epidemiol. Biomark. Prev. 14, 2296–2302 (2005).
Alberg, A. J., Brock, M. V., Ford, J. G., Samet, J. M. & Spivack, S. D. Epidemiology of lung cancer: diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e1S–e29S (2013).
Spivack, S. D., Fasco, M. J., Walker, V. E. & Kaminsky, L. S. The molecular epidemiology of lung cancer. Crit. Rev. Toxicol. 27, 319–365 (1997).
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).
Li, H. et al. Frequency of well-identified oncogenic driver mutations in lung adenocarcinoma of smokers varies with histological subtypes and graduated smoking dose. Lung Cancer 79, 8–13 (2013).
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Burns, D. M. Cigarette smoking among the elderly: disease consequences and the benefits of cessation. Am. J. Health Promot. 14, 357–361 (2000).
Crispo, A. et al. The cumulative risk of lung cancer among current, ex- and never-smokers in European men. Br. J. Cancer 91, 1280–1286 (2004).
Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016).
George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 47–53 (2015).
Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).
Shaykhiev, R. et al. Airway basal cells of healthy smokers express an embryonic stem cell signature relevant to lung cancer. Stem Cells 31, 1992–2002 (2013).
McQualter, J. L., Yuen, K., Williams, B. & Bertoncello, I. Evidence of an epithelial stem/progenitor cell hierarchy in the adult mouse lung. Proc. Natl Acad. Sci. USA 107, 1414–1419 (2010).
Fukui, T. et al. Lung adenocarcinoma subtypes based on expression of human airway basal cell genes. Eur. Respir. J. 42, 1332–1344 (2013).
Rock, J. R. et al. Basal cells as stem cells of the mouse trachea and human airway epithelium. Proc. Natl Acad. Sci. USA 106, 12771–12775 (2009).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium.Nature 578, 266–272 (2020).
Brazhnik, K. et al. Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver. Sci. Adv. 6, eaax2659 (2020).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Zhang, L. et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl Acad. Sci. USA 116, 9014–9019 (2019).
Remen, T., Pintos, J., Abrahamowicz, M. & Siemiatycki, J. Risk of lung cancer in relation to various metrics of smoking history: a case-control study in Montreal 11 Medical and Health Sciences 1117 Public Health and Health Services. BMC Cancer 18, 1–12 (2018).
Siemiatycki, J. Synthesizing the lifetime history of smoking.Cancer Epidemiol. Biomarkers Prev. 14, 2294–2295 (2005).
Thomas, D. C. Is it time to retire the “pack-years” variable? Maybe not! Am. J. Epidemiol. 179, 299–302 (2014).
Jilani, A. et al. Molecular cloning of the human gene, PNKP, encoding a polynucleotide kinase 3’-phosphatase and evidence for its role in repair of DNA strand breaks caused by oxidative damage. J. Biol. Chem. 274, 24176–24186 (1999).
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Song, K. et al. A quantitative method for assessing smoke associated molecular damage in lung cancers. Transl. Lung Cancer Res. 7, 439–449 (2018).
Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).
Anderson, G. P. & Bozinovski, S. Acquired somatic mutations in the molecular pathogenesis of COPD. Trends Pharmacol. Sci. 24, 71–76 (2003).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Broderick, P. et al. Deciphering the impact of common genetic variation on lung cancer risk: a genome-wide association study. Cancer Res. 69, 6633–6641 (2009).
Hung, R. J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).
Shiraishi, K. et al. A genome-wide association study identifies two new susceptibility loci for lung adenocarcinoma in the Japanese population. Nat. Genet. 44, 900–903 (2012).
Wu, C. et al. Genetic variants on chromosome 15q25 associated with lung cancer risk in Chinese populations. Cancer Res. 69, 5065–5072 (2009).
Wang, Y. et al. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat. Genet. 40, 1407–1409 (2008).
Harrison, S. M. et al. Using ClinVar as a resource to support variant interpretation. Curr. Protoc. Hum. Genet. 89, 8 16 1–8 16 23 (2016).
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes, PCAWG Drivers and Functional Interpretation Working Group 68, PCAWG Structural Variation Working Group. Nature 578, 67–67 (1965).
Burczynski, M. E., Lin, H. K. & Penning, T. M. Isoform-specific induction of a human aldo-keto reductase by polycyclic aromatic hydrocarbons (PAHs), electrophiles, and oxidative stress: implications for the alternative pathway of PAH activation catalyzed by human dihydrodiol dehydrogenase. Cancer Res. 59, 607–614 (1999).
Fluck, C. E. et al. Why boys will be boys: two pathways of fetal testicular androgen biosynthesis are needed for male sexual differentiation. Am. J. Hum. Genet. 89, 201–218 (2011).
Nowell, P. C. The clonal evolution of tumor cell populations. Science 194, 23–28 (1976).
Vijg, J. Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26, 141–149 (2014).
Rozhok, A. I. & DeGregori, J. The evolution of lifespan and age-dependent cancer risk. Trends Cancer 2, 552–560 (2016).
Obe, G., Heller, W. D. & Vogt, H. J. in Mutations in Man (ed. Obe, G.) 223–246 (Springer, 1984).
Dong, X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat. Methods 14, 491–493 (2017).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Westhoff, B. et al. Alterations of the Notch pathway in lung cancer. Proc. Natl Acad. Sci. USA 106, 22293–22298 (2009).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Nagel, Z. D., Chaim, I. A. & Samson, L. D. Inter-individual variation in DNA repair capacity: a need for multi-pathway functional assays to promote translational DNA repair research. DNA Repair (Amst.) 19, 199–213 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Wright, C. F. et al. Evaluating variants classified as pathogenic in ClinVar in the DDD study. Genet. Med. 23, 571–575 (2021).
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Bamford, S. et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 91, 355–358 (2004).
Martincorena, I., Raine, K. M., Davies, H., Stratton, M. R. & Campbell, P. J. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Bates, D., M.M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Acknowledgements
This study was supported by National Institutes of Health grants U01 ES029519-01 (J.V. and S.D.S), U01HL145560 (S.D.S. and J.V.) AG017242 (J.V.), AG056278 (J.V.), AG047200 (J.V. and V.G.) and the Glenn Foundation for Medical Research. We thank A. Desai and D. Patel (Pulmonary Medicine) for bronchoscopy sample procurement, S. Khader for cytopathology and X. Hao for assisting with data analysis.
Author information
Authors and Affiliations
Contributions
J.V., A.Y.M. and S.D.S. conceived this study and designed the experiments. S.D.S., M.S., T.S., Y.P., C.S. and A.S. provided clinical, procedural and specimen-specific study expertise and logistics. Z.H. performed the experiments. Z.H., J.V., A.Y.M., S.S. and K.Y. analyzed the data. Z.H. and J.V. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
A.Y.M., X.D. and J.V. are cofounders of SingulOmics. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Peter Campbell, Benjamin Izar, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Mutation frequency and correction deviation error.
SNV frequency of never-smokers versus age. Each data point indicates the mutation frequency per nucleus from each individual, with color intensity indicating relative standard error value (see Methods). The four cells of highest mutational burden were plotted separately with each data point representing median value with standard deviation errors.
Extended Data Fig. 2 Distribution of shared mutations in subject 1320.
a, Stacked bar plot showing the proportional contribution of shared SNVs between all sequenced 3-8 nuclei per subject. b, Upset plot showing the distribution of shared SNVs in six nuclei from subject 1320 (lower part). The bar chart (upper part) represents the number of SNVs shared by each nucleus combination.
Extended Data Fig. 3 An a prori semi-parametric B-spline model to test the non-linearity between mutation frequency and smoking pack-years.
Each data point indicates the SNV frequency of nuclei of individuals. The spline fit evaluated at the average age and the average of random effects, with the 95% confidence interval are shown by the gray line, with the piece-wise linear model fit as the blue line. P value for the spline model is 0.0043 compared to the linear model, and 0.0034 when compared to the null model (see Methods).
Extended Data Fig. 4 INDEL frequency and smoking dose.
a, INDEL frequency versus smoking pack-years across all individuals (n = 33). Each dot indicates the median value and the minimal and maximal range of INDEL frequency of individuals. b, INDEL frequency of different group of individuals according to the smoking pack-years, with boxes indicating median number and interquartile range of the never (n = 14), light (n = 6), moderate (n = 6), and heavy (n = 7) smoking group, respectively.
Extended Data Fig. 5 Effects of smoking cessation on mutation frequency.
Median number of SNV and INDEL frequency among former smokers (n = 7) and current smokers (n = 12). a, each data point indicates the median value and the minimal and maximal range of SNV frequency of 3-8 nuclei per subject. b, each data point indicates the median value and the minimal and maximal range of INDEL frequency of 3-8 nuclei per subject. P values were obtained by likelihood ratio tests using negative binomial mixed-effect model.
Extended Data Fig. 6 SNV frequency in the lung functional genome using scRNA-seq human lung data instead of GTEX.
Each data point represents the number of mutations per nucleus of in functional genome (x axis) and whole genome (y axis) of all subjects colored by smoking status.
Extended Data Fig. 7 Cancer driver mutations.
a, Distribution of driver gene mutations in single nuclei of subjects, with number of mutations and smoking status indicated by colors. b, Total number of single nuclei with unique mutations found in pan-cancer driver genes and number of unique mutations in pan-cancer driver genes across the sample set (n = 134), 22 of 85 driver genes shown (Supplementary Table 5).
Extended Data Fig. 8 Mutational signatures and smoking .
a, Mutation spectra of four novel signatures identified among never-smokers and smokers. The six substitution types are shown across the top. Within each substitution type, the trinucleotide context is shown as four sets of four bars, grouped by whether an A, C, G or T, respectively, is 5′ or 3′ to the mutated base. b-f, Absolute number of major signatures discovered from never-smokers (n = 14) and smokers (n = 19). Each dot indicates the median number of SNV frequency of each individual. Boxes indicate median values and interquartile ranges among each group. The quoted P values were obtained by likelihood ratio tests using linear mixed-effect models. g, APOBEC signatures relative contribution versus SNV frequency of nuclei of never-smokers. Each data point represents a nucleus.
Extended Data Fig. 9 The INDEL mutation signature analysis.
a, Mutation spectra of INDEL in single nuclei from never-smokers (n = 14) and smokers (n = 19). The contributions of different types of INDELs are shown, grouped by whether variants are deletions or insertions; the size of the event; whether they occur at repeat units; and the sequence content of the INDEL. b, Stacked bar plot showing the proportional contribution of mutational signatures to INDELs across all nuclei (n = 134) measured from never-smokers and smokers, four signatures (N1, ID1, ID3, ID4) were extracted by HDP.
Extended Data Fig. 10 Germline genetic variants associated with solid cancers.
A heatmap showing 6 germline variants associated to solid cancers found in each subject per column, with the presence and absence colored. Variant IDs at the left of each row of the heatmap represent 6 different solid cancer associated single-nucleotide polymorphisms found through Clinvar (Supplementary Table 7).
Supplementary information
Supplementary Information
Supplementary Note, Figures 1–4 and Tables 1–7.
Supplementary Table 1
Supplementary Table 1–7, with a title included in each tab.
Rights and permissions
About this article
Cite this article
Huang, Z., Sun, S., Lee, M. et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat Genet 54, 492–498 (2022). https://doi.org/10.1038/s41588-022-01035-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-022-01035-w
This article is cited by
-
Distinctive field effects of smoking and lung cancer case-control status on bronchial basal cell growth and signaling
Respiratory Research (2024)
-
Chronic disease and multimorbidity in the Chinese older adults’ population and their impact on daily living ability: a cross-sectional study of the Chinese Longitudinal Healthy Longevity Survey (CLHLS)
Archives of Public Health (2024)
-
Analyzing somatic mutations by single-cell whole-genome sequencing
Nature Protocols (2024)
-
Selective pressures of platinum compounds shape the evolution of therapy-related myeloid neoplasms
Nature Communications (2024)
-
Computational immunogenomic approaches to predict response to cancer immunotherapies
Nature Reviews Clinical Oncology (2024)