Tobacco smoking causes lung cancer1,2,3, a process that is driven by more than 60 carcinogens in cigarette smoke that directly damage and mutate DNA4,5. The profound effects of tobacco on the genome of lung cancer cells are well-documented6,7,8,9,10, but equivalent data for normal bronchial cells are lacking. Here we sequenced whole genomes of 632 colonies derived from single bronchial epithelial cells across 16 subjects. Tobacco smoking was the major influence on mutational burden, typically adding from 1,000 to 10,000 mutations per cell; massively increasing the variance both within and between subjects; and generating several distinct mutational signatures of substitutions and of insertions and deletions. A population of cells in individuals with a history of smoking had mutational burdens that were equivalent to those expected for people who had never smoked: these cells had less damage from tobacco-specific mutational processes, were fourfold more frequent in ex-smokers than current smokers and had considerably longer telomeres than their more-mutated counterparts. Driver mutations increased in frequency with age, affecting 4–14% of cells in middle-aged subjects who had never smoked. In current smokers, at least 25% of cells carried driver mutations and 0–6% of cells had two or even three drivers. Thus, tobacco smoking increases mutational burden, cell-to-cell heterogeneity and driver mutations, but quitting promotes replenishment of the bronchial epithelium from mitotically quiescent cells that have avoided tobacco mutagenesis.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Rare mutation-dominant compound EGFR-positive NSCLC is associated with enriched kinase domain-resided variants of uncertain significance and poor clinical outcomes
BMC Medicine Open Access 24 February 2023
APOBEC mutagenesis is a common process in normal human small intestine
Nature Genetics Open Access 26 January 2023
Single-cell transcriptomics highlights immunological dysregulations of monocytes in the pathobiology of COPD
Respiratory Research Open Access 20 December 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Sequencing data have been deposited at the European Genome-phenome Archive (http://www.ebi.ac.uk/ega/) under the accession number EGAD00001005193. Somatic-mutation calls, including single-base substitutions, indels and structural variants, from all 632 samples have been deposited on Mendeley Data with the identifier: https://doi.org/10.17632/b53h2kwpyy.2.
Detailed method and custom R scripts for the analysis of mutational burden in bronchial epithelium are available in Supplementary Code. Other packages used in the analysis are as follows: R v.3.5.1; BWA-MEM v.0.7.17-r1188 (https://sourceforge.net/projects/bio-bwa/); CaVEMan v.1.11.2 (https://github.com/cancerit/CaVEMan); Pindel v.2.2.5 (https://github.com/cancerit/cgpPindel); Brass v.6.1.2 (https://github.com/cancerit/BRASS); ASCAT NGS v. 4.1.2 (https://github.com/cancerit/ascatNgs); Xenome (https://github.com/data61/gossamer/blob/master/docs/xenome.md); deepSNV v.1.28.0 (https://bioconductor.org/packages/release/bioc/html/deepSNV.html); ANNOVAR (http://wannovar.wglab.org/); IGV (http://software.broadinstitute.org/software/igv/); JBrowse (https://jbrowse.org/); cgpVAF (https://github.com/cancerit/vafCorrect); RPhylip v.0.1.23 (http://www.phytools.org/Rphylip/); hdp v.0.1.5 (https://github.com/nicolaroberts/hdp); MutationalPatterns v.1.8.0 (https://bioconductor.org/packages/release/bioc/html/MutationalPatterns.html); dNdScv v.0.0.1 (https://github.com/im3sanger/dndscv); and Telomerecat v.3.1.2 (https://github.com/jhrf/telomerecat).
Alberg, A. J., Brock, M. V., Ford, J. G., Samet, J. M. & Spivack, S. D. Epidemiology of lung cancer. Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e1S–e29S (2013).
Peto, R. et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. Br. Med. J. 321, 323–329 (2000).
International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (IARC and World Health Organization, 2004).
Hecht, S. S. Progress and challenges in selected areas of tobacco carcinogenesis. Chem. Res. Toxicol. 21, 160–171 (2008).
Pfeifer, G. P. et al. Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene 21, 7435–7451 (2002).
Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010).
Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).
Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016).
George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 47–53 (2015).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Tomasetti, C., Marchionni, L., Nowak, M. A., Parmigiani, G. & Vogelstein, B. Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl Acad. Sci. USA 112, 118–123 (2015).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).
Garfinkel, L. & Stellman, S. D. Smoking and lung cancer in women: findings in a prospective study. Cancer Res. 48, 6951–6955 (1988).
Armitage, P. Response to Richard Doll: the age distribution of cancer. J. Roy. Stat. Soc. A 134, 155–156 (1971).
Doll, R. & Peto, R. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers. J. Epidemiol. Community Health 32, 303–313 (1978).
Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857 (2019).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
Letouzé, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).
Alexandrov, L. et al. The repertoire of mutational signatures in human cancer. Nature https://doi.org/10.1038/s41586-020-1943-3 (2020).
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).
Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Teixeira, V. H. et al. Stochastic homeostasis in human airway epithelium is achieved by neutral competition of basal cell progenitors. eLife 2, e00966 (2013).
Hegab, A. E. et al. Novel stem/progenitor cell population from murine tracheal submucosal gland ducts with multipotent regenerative potential. Stem Cells 29, 1283–1293 (2011).
Tata, A. et al. Myoepithelial cells of submucosal glands can function as reserve stem cells to regenerate airways after injury. Cell Stem Cell 22, 668–683 (2018).
Lynch, T. J. et al. Submucosal gland myoepithelial cells are reserve stem cells that can regenerate mouse tracheal epithelium. Cell Stem Cell 22, 653–667 (2018).
Gowers, K. H. C. et al. Optimized isolation and expansion of human airway epithelial basal cells from endobronchial biopsy samples. J. Tissue Eng. Regen. Med. 12, e313–e317 (2018).
Butler, C. R. et al. Rapid expansion of human epithelial stem cells suitable for airway tissue engineering. Am. J. Respir. Crit. Care Med. 194, 156–168 (2016).
Teixeira, V. H. et al. Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions. Nat. Med. 25, 517–525 (2019).
Conway, T. et al. Xenome—a tool for classifying reads from xenograft samples. Bioinformatics 28, 172–178 (2012).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).
Yang, H. & Wang, K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015).
Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.9.1–15.9.17 (2016).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Buels, R. et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 17, 66 (2016).
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature https://doi.org/10.1038/s41586-019-1913-9 (2020).
Farmery, J. H. R. et al. Telomerecat: a ploidy-agnostic method for estimating telomere length from whole genome sequencing data. Sci. Rep. 8, 1300 (2018).
This work was supported by a Cancer Research UK Grand Challenge Award (C98/A24032) and the Wellcome Trust. P.J.C. and S.M.J. are Wellcome Trust Senior Clinical Fellows (WT088340MA); S.M.J. receives funding as a member of the UK Regenerative Medicine Platform (UKRMP2) Engineered Cell Environment Hub (MRC; MR/R015635/1) and the Longfonds BREATH lung regeneration consortium, and is further supported by The Rosetrees Trust, the Stoneygate Trust, the British Lung Foundation and the UCLH Charitable Foundation; K.Y. is supported by a Japan Society for the Promotion of Science (JSPS) Overseas Research Fellowship and The Mochida Memorial Foundation for Medical and Pharmaceutical Research; S.M.J. and R.E.H. are supported by the Roy Castle Lung Cancer Foundation; R.E.H. is a Wellcome Trust Sir Henry Wellcome Fellow (WT209199/Z/17/Z); and I.M. is funded by Cancer Research UK (C57387/A21777). The authors thank S. Broad and F. Watt for providing 3T3-J2 fibroblasts, and B. Carroll for help with sample collection.
The authors declare no competing interests.
Peer review information Nature thanks Gerd P. Pfeifer, Roman Thomas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Flow-sorting strategy of single basal bronchial epithelial cells.
a, Sorting of EpCAM+ epithelial cells from human airway biopsies. Human haematopoietic and endothelial cells were stained with antibodies against CD45 and CD31, respectively. Within the population of cells negative for those markers, EpCAM-expressing cells were gated. Single, live (DAPI-negative) cells were flow-sorted from this population into individual wells of 96-well plates. b, Quantitative PCR (qPCR) analysis of cultures of clonally derived airway epithelial cells. Airway basal cells express integrin subunit α 6 (ITGA6), keratin 5 (KRT5), cadherin 1 (CDH1) and TP63. Expression is shown in clonally derived cell cultures (n = 13 from 3 donors, coloured blue, green and orange) compared to control bulk human bronchial epithelial cell cultures (HBECs) that were expanded in the same culture conditions and lung fibroblast cell cultures (lung fibs) that served as a negative control. The centre values and error bars indicate mean and s.e.m., respectively. Conditions in which no expression was detected are shown as 0. c, Colony-forming efficiency of CD45−CD31−EPCAM+ cells after single-cell sorting from endobronchial biopsy samples (n = 16). For one ex-smoker, EpCAM was not used to select cells and only CD45−CD31− cells were sorted; as expected, this was the patient with the lowest colony-forming efficiency.
Extended Data Fig. 2 Quality assurance of mutation calls.
a, Stacked bar chart showing the proportion of reads attributed to the human genome, mouse genome, both, neither, or with ambiguous mapping for the pure mouse fibroblast feeder line (left) or a pure human sample (right), assessed with the Xenome pipeline. b, Clean-up of mutation calls using the Xenome pipeline for one of the samples that was more heavily contaminated by the mouse feeder layer. The Venn diagram on the left shows the overlap in mutation calls before and after removing non-human reads by Xenome. c, Histograms of VAFs for two representative colonies in the sample set. The plot on the left shows a tight distribution around 50%, as expected for a colony derived from a single cell without contamination. The plot on the right shows a bimodal distribution with one peak at 50% (mutations present in the original basal cell) and a second peak at around 25% (probably representing mutations that were acquired in vitro during colony expansion). These second peaks at less than 50% are more evident in colonies from children, owing to the low number of mutations in the original basal cell. d, Histogram of VAFs for a colony seeded by more than one basal cell, leading to a peak at much less than 50%. e, Estimated sensitivity of mutation calling according to sequencing depth. Heterozygous germline polymorphisms were identified in each subject; for each colony sequenced, we calculated the fraction of these polymorphisms that was recalled by our algorithms. f, Comparison of mutational burden in normal bronchial epithelial cells that neighbour a carcinoma in situ (CIS) versus cells distant from the CIS in five patients. The box-and-whisker plots show the distribution of mutational burden per colony within each subject, with the boxes indicating median and interquartile range and the whiskers denoting the range. The overlaid points are the observed mutational burden of individual colonies.
Extended Data Fig. 3 Colonies with a near-normal mutational burden.
a, Density distribution of mutational burden in cells from ex-smokers (green) and current smokers (purple). The black vertical line shows the threshold for near-normal mutational burden derived for each patient. The x axis is on a logarithmic scale. Note the frequently bimodal distribution of mutational burden, especially in the ex-smokers, with the modes separated at the threshold for near-normal mutational burden. b, Flow cytometric analysis of clones for expression of KRT5, EpCAM, ITGA6, podoplanin (PDPN), NGFR and CD45 or CD31. Lung fibroblasts are included as a comparison. Fluorescence minus one (FMO) is shown. Plots for one clone with a near-normal mutational burden (low-mutant clone) and one with an increased burden (high-mutant clone) are shown, and are representative of five clones from one patient. c, Bright-field images of expanded clones at passage 3, showing cobblestone epithelial morphology. Images are representative of five clones from one patient. A clone with an increased mutational burden is shown at the top, and a clone from an ex-smoker with a near-normal mutational burden is shown at the bottom. For the left images, the magnification is ×10 and the scale bar is 200 μm; for the right images, the magnification is ×20 and the scale bar is 100 μm.
Extended Data Fig. 4 Indels, copy-number changes and structural variants in normal bronchial epithelial cells.
a, Relationship of burden of indels per cell with age. The points represent individual colonies (n = 632) and are coloured by smoking status. The black line represents the fitted effect of age on indel burden, which was estimated from LME models after correction for smoking status and within-patient correlation structure. The blue shaded area represents the 95% CI for the fitted line. b, Stacked bar plot showing the distribution of colonies with 0–7 copy-number changes and structural variants across the 16 subjects. c, Three examples of chromoplexy in normal bronchial cells. Structural variants are shown as coloured arcs that join two positions in the genome around the circumference. The instances of chromoplexy all consist of three translocations (purple). d, An example of chromothripsis in a cell from an 11-month old child. The plot on the right shows the copy number of genomic windows in the relevant region of chromosome 1 (black points); the lines and arcs denote the positions of observed structural variants.
Extended Data Fig. 5 Comparison of mutational signatures that were extracted using two algorithms.
a, Trinucleotide contexts for the signatures extracted by the hierarchical Dirichlet process (HDP) (left) and MutationalPatterns non-negative matrix factorization (right). The six substitution types are shown across the top of each signature. Within each signature, the trinucleotide context is shown as four sets of four bars, grouped by whether an A, C, G or T respectively is 5′ to the mutated base, and within each group of four by whether A, C, G or T is 3′ to the mutated base (the order of bars is the same as that shown in Fig. 2b). Where signatures show high cosine similarity scores between algorithms, they are lined up horizontally. We note that Signature C in MutationalPatterns does not have a match in the signatures extracted by the HDP algorithm, but appears very similar to Signature A in MutationalPatterns (or SBS-5 from the HDP). This means that it probably represents over-splitting of the signatures. b, Heat map showing the cosine similarities of signatures extracted by MutationalPatterns with those extracted by the HDP. Only cosine-similarity scores that are greater than 0.75 are coloured. c, Scatter plots showing the fraction of mutations in each colony (n = 632) assigned to each signature by the HDP algorithm (x axis) versus the MutationalPatterns algorithm (y axis). The correlation values quoted are Pearson’s correlation coefficients (R2). d, Transcriptional strand bias of A>G mutations in an N[A]T context before and after TSSs. Note the absence of transcriptional strand bias in intergenic regions but evidence for both transcription-coupled damage and repair after the TSS, applying similarly in both never-smokers and ex- or current smokers.
Extended Data Fig. 6 Phylogenetic trees of 13 subjects.
Phylogenetic trees showing clonal relationships among normal bronchial cells in the 13 subjects not shown in Fig. 3a. Branch lengths are proportional to the number of mutations (x axis) specific to that clone or subclone. Each branch is coloured by the proportion of mutations on that branch that are attributed to the various SBS signatures.
Extended Data Fig. 7 Indel signatures in the sample set.
a, Five indel signatures (ID-1, ID-2, ID-3, ID-5 and ID-8) were extracted by the HDP. The contributions of different types of indels to each signature are shown, grouped by whether variants are deletions or insertions; the size of the event; whether they occur at repeat units; and the sequence content of the indel. b, Stacked bar plot showing the proportional contribution of mutational signatures to indels across the 632 colonies derived from normal bronchial cells, extracted using the HDP. Within each patient, colonies are sorted from left to right by increasing indel burden (bar chart in dark grey above coloured signature-attribution stacks).
Extended Data Fig. 8 DBS signatures in the sample set.
a, Six DBS signatures were extracted by the HDP. The contributions of different types of double-base substitution to each signature are shown, grouped by the sequence that is mutated and by what it is mutated to. Five of the signatures have been observed in cancer genomes24, and one (DBS Sig-C) is a novel signature that was extracted here. b, Stacked bar plot showing the proportional contribution of mutational signatures to double-base substitutions across the 632 normal bronchial cells, extracted using the HDP. Note that some of the colonies in children have no double-base substitutions. Within each patient, colonies are sorted from left to right by increasing burden of double-base substitutions (bar chart in dark grey above coloured signature-attribution stacks).
Extended Data Fig. 9 Driver mutations in normal bronchial epithelium.
a, Stick plots showing distribution of mutations in TP53, NOTCH1 and other genes that were significantly mutated in our sample set. Mutations are coloured by type. The gene structure is shown horizontally in the centre of each plot, with domains as coloured bars. Above the gene are mutations in this sample set, and below the gene are mutations found in squamous cell carcinomas from the TCGA sample set. b, Fraction of cells with driver mutations in TP53 (left), NOTCH1 (middle) or all other significant cancer genes (right), split by smoking status.
Extended Data Fig. 10 Relationship of telomere length with age.
Scatter plot of estimated telomere lengths (y axis) against the age of the subject (x axis). Individual points represent colonies (n = 398 colonies in which less than 10% of the DNA was derived from the mouse feeder layer). Cells with a near-normal mutational burden are coloured gold.
This file contains Supplementary Tables 1-5.
This zip file contains Supplementary Code which shows code in HTML format and relevant source data. Statistical_analyses_bronchial_epithelium.html: HTML file containing embedded code, description and output and Lung_organoids_telomeres_with_contamination_20190408.txt: Tab-delimited text file containing source data for the Supplementary Code.
Rights and permissions
About this article
Cite this article
Yoshida, K., Gowers, K.H.C., Lee-Six, H. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). https://doi.org/10.1038/s41586-020-1961-1
This article is cited by
Rare mutation-dominant compound EGFR-positive NSCLC is associated with enriched kinase domain-resided variants of uncertain significance and poor clinical outcomes
BMC Medicine (2023)
The age of bone marrow dictates the clonality of smooth muscle-derived cells in atherosclerotic plaques
Nature Aging (2023)
Neuroendocrine neoplasms of the lung and gastrointestinal system: convergent biology and a path to better therapies
Nature Reviews Clinical Oncology (2023)
Somatic genetic variation in healthy tissue and non-cancer diseases
European Journal of Human Genetics (2023)
APOBEC mutagenesis is a common process in normal human small intestine
Nature Genetics (2023)
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.