Tobacco smoking and somatic mutations in human bronchial epithelium

Abstract

Tobacco smoking causes lung cancer1,2,3, a process that is driven by more than 60 carcinogens in cigarette smoke that directly damage and mutate DNA4,5. The profound effects of tobacco on the genome of lung cancer cells are well-documented6,7,8,9,10, but equivalent data for normal bronchial cells are lacking. Here we sequenced whole genomes of 632 colonies derived from single bronchial epithelial cells across 16 subjects. Tobacco smoking was the major influence on mutational burden, typically adding from 1,000 to 10,000 mutations per cell; massively increasing the variance both within and between subjects; and generating several distinct mutational signatures of substitutions and of insertions and deletions. A population of cells in individuals with a history of smoking had mutational burdens that were equivalent to those expected for people who had never smoked: these cells had less damage from tobacco-specific mutational processes, were fourfold more frequent in ex-smokers than current smokers and had considerably longer telomeres than their more-mutated counterparts. Driver mutations increased in frequency with age, affecting 4–14% of cells in middle-aged subjects who had never smoked. In current smokers, at least 25% of cells carried driver mutations and 0–6% of cells had two or even three drivers. Thus, tobacco smoking increases mutational burden, cell-to-cell heterogeneity and driver mutations, but quitting promotes replenishment of the bronchial epithelium from mitotically quiescent cells that have avoided tobacco mutagenesis.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Mutational burden in normal bronchial epithelium.
Fig. 2: Mutational signatures in normal bronchial epithelium.
Fig. 3: Driver mutations in normal bronchial epithelial cells.
Fig. 4: Relationship of telomere length with mutational burden.

Data availability

Sequencing data have been deposited at the European Genome-phenome Archive (http://www.ebi.ac.uk/ega/) under the accession number EGAD00001005193. Somatic-mutation calls, including single-base substitutions, indels and structural variants, from all 632 samples have been deposited on Mendeley Data with the identifier: https://doi.org/10.17632/b53h2kwpyy.2.

Code availability

Detailed method and custom R scripts for the analysis of mutational burden in bronchial epithelium are available in Supplementary Code. Other packages used in the analysis are as follows: R v.3.5.1; BWA-MEM v.0.7.17-r1188 (https://sourceforge.net/projects/bio-bwa/); CaVEMan v.1.11.2 (https://github.com/cancerit/CaVEMan); Pindel v.2.2.5 (https://github.com/cancerit/cgpPindel); Brass v.6.1.2 (https://github.com/cancerit/BRASS); ASCAT NGS v. 4.1.2 (https://github.com/cancerit/ascatNgs); Xenome (https://github.com/data61/gossamer/blob/master/docs/xenome.md); deepSNV v.1.28.0 (https://bioconductor.org/packages/release/bioc/html/deepSNV.html); ANNOVAR (http://wannovar.wglab.org/); IGV (http://software.broadinstitute.org/software/igv/); JBrowse (https://jbrowse.org/); cgpVAF (https://github.com/cancerit/vafCorrect); RPhylip v.0.1.23 (http://www.phytools.org/Rphylip/); hdp v.0.1.5 (https://github.com/nicolaroberts/hdp); MutationalPatterns v.1.8.0 (https://bioconductor.org/packages/release/bioc/html/MutationalPatterns.html); dNdScv v.0.0.1 (https://github.com/im3sanger/dndscv); and Telomerecat v.3.1.2 (https://github.com/jhrf/telomerecat).

References

  1. 1.

    Alberg, A. J., Brock, M. V., Ford, J. G., Samet, J. M. & Spivack, S. D. Epidemiology of lung cancer. Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e1S–e29S (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Peto, R. et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. Br. Med. J. 321, 323–329 (2000).

    CAS  Google Scholar 

  3. 3.

    International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (IARC and World Health Organization, 2004).

  4. 4.

    Hecht, S. S. Progress and challenges in selected areas of tobacco carcinogenesis. Chem. Res. Toxicol. 21, 160–171 (2008).

    PubMed  Google Scholar 

  5. 5.

    Pfeifer, G. P. et al. Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene 21, 7435–7451 (2002).

    CAS  PubMed  Google Scholar 

  6. 6.

    Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010).

    ADS  CAS  PubMed  Google Scholar 

  7. 7.

    Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 47–53 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    CAS  PubMed  Google Scholar 

  11. 11.

    Tomasetti, C., Marchionni, L., Nowak, M. A., Parmigiani, G. & Vogelstein, B. Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl Acad. Sci. USA 112, 118–123 (2015).

    ADS  CAS  PubMed  Google Scholar 

  12. 12.

    Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Garfinkel, L. & Stellman, S. D. Smoking and lung cancer in women: findings in a prospective study. Cancer Res. 48, 6951–6955 (1988).

    CAS  PubMed  Google Scholar 

  15. 15.

    Armitage, P. Response to Richard Doll: the age distribution of cancer. J. Roy. Stat. Soc. A 134, 155–156 (1971).

    Google Scholar 

  16. 16.

    Doll, R. & Peto, R. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers. J. Epidemiol. Community Health 32, 303–313 (1978).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857 (2019).

    CAS  PubMed  Google Scholar 

  18. 18.

    Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

    ADS  Google Scholar 

  19. 19.

    Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).

    PubMed  PubMed Central  Google Scholar 

  21. 21.

    Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Letouzé, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).

    ADS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Alexandrov, L. et al. The repertoire of mutational signatures in human cancer. Nature https://doi.org/10.1038/s41586-020-1943-3 (2020).

    PubMed  Google Scholar 

  25. 25.

    Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    ADS  Google Scholar 

  26. 26.

    Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).

    ADS  CAS  PubMed  Google Scholar 

  30. 30.

    Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).

    CAS  PubMed  Google Scholar 

  31. 31.

    Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).

    ADS  CAS  PubMed  Google Scholar 

  32. 32.

    Teixeira, V. H. et al. Stochastic homeostasis in human airway epithelium is achieved by neutral competition of basal cell progenitors. eLife 2, e00966 (2013).

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Hegab, A. E. et al. Novel stem/progenitor cell population from murine tracheal submucosal gland ducts with multipotent regenerative potential. Stem Cells 29, 1283–1293 (2011).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Tata, A. et al. Myoepithelial cells of submucosal glands can function as reserve stem cells to regenerate airways after injury. Cell Stem Cell 22, 668–683 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Lynch, T. J. et al. Submucosal gland myoepithelial cells are reserve stem cells that can regenerate mouse tracheal epithelium. Cell Stem Cell 22, 653–667 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Gowers, K. H. C. et al. Optimized isolation and expansion of human airway epithelial basal cells from endobronchial biopsy samples. J. Tissue Eng. Regen. Med. 12, e313–e317 (2018).

    CAS  PubMed  Google Scholar 

  37. 37.

    Butler, C. R. et al. Rapid expansion of human epithelial stem cells suitable for airway tissue engineering. Am. J. Respir. Crit. Care Med. 194, 156–168 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Teixeira, V. H. et al. Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions. Nat. Med. 25, 517–525 (2019).

    CAS  PubMed  Google Scholar 

  39. 39.

    Conway, T. et al. Xenome—a tool for classifying reads from xenograft samples. Bioinformatics 28, 172–178 (2012).

    Google Scholar 

  40. 40.

    Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Google Scholar 

  41. 41.

    Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Yang, H. & Wang, K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015).

    CAS  PubMed  Google Scholar 

  43. 43.

    Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).

    Google Scholar 

  44. 44.

    Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.9.1–15.9.17 (2016).

    Google Scholar 

  46. 46.

    Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    ADS  Google Scholar 

  47. 47.

    Buels, R. et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 17, 66 (2016).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).

    MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature https://doi.org/10.1038/s41586-019-1913-9 (2020).

    PubMed  Google Scholar 

  51. 51.

    Farmery, J. H. R. et al. Telomerecat: a ploidy-agnostic method for estimating telomere length from whole genome sequencing data. Sci. Rep. 8, 1300 (2018).

    ADS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by a Cancer Research UK Grand Challenge Award (C98/A24032) and the Wellcome Trust. P.J.C. and S.M.J. are Wellcome Trust Senior Clinical Fellows (WT088340MA); S.M.J. receives funding as a member of the UK Regenerative Medicine Platform (UKRMP2) Engineered Cell Environment Hub (MRC; MR/R015635/1) and the Longfonds BREATH lung regeneration consortium, and is further supported by The Rosetrees Trust, the Stoneygate Trust, the British Lung Foundation and the UCLH Charitable Foundation; K.Y. is supported by a Japan Society for the Promotion of Science (JSPS) Overseas Research Fellowship and The Mochida Memorial Foundation for Medical and Pharmaceutical Research; S.M.J. and R.E.H. are supported by the Roy Castle Lung Cancer Foundation; R.E.H. is a Wellcome Trust Sir Henry Wellcome Fellow (WT209199/Z/17/Z); and I.M. is funded by Cancer Research UK (C57387/A21777). The authors thank S. Broad and F. Watt for providing 3T3-J2 fibroblasts, and B. Carroll for help with sample collection.

Author information

Affiliations

Authors

Contributions

S.M.J., P.J.C., K.Y., K.H.C.G. and H.L.-S. designed the experiments. K.H.C.G performed all of the sample collection, cell isolation, clonal expansion and DNA extraction, with help from D.P.C., E.F.M. and F.R.M. E.F.M. and C.R.B. collected the paediatric samples, and E.F.M., D.P.C. and R.M.T. collected the adult samples. E.A. made sequencing libraries. K.Y. performed most of the data curation and statistical analysis, with H.L.-S., T.C., K.B., A.M., N.K. and T.H. providing assistance and advice. S.E.C. oversaw all of the clinical data collection and curation, and performed the flow cytometry characterization of the clones. R.E.H. and K.H.C.G. performed the qPCR characterization of the clones. M.R.S. oversaw the analysis of mutational signatures. P.J.C. and I.M. oversaw statistical analyses. R.E.H., A.P., K.H.C.G., K.Y., S.M.J. and P.J.C. performed data interpretation and, together with D.P.C., helped to draft and revise the manuscript.

Corresponding authors

Correspondence to Sam M. Janes or Peter J. Campbell.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Gerd P. Pfeifer, Roman Thomas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Flow-sorting strategy of single basal bronchial epithelial cells.

a, Sorting of EpCAM+ epithelial cells from human airway biopsies. Human haematopoietic and endothelial cells were stained with antibodies against CD45 and CD31, respectively. Within the population of cells negative for those markers, EpCAM-expressing cells were gated. Single, live (DAPI-negative) cells were flow-sorted from this population into individual wells of 96-well plates. b, Quantitative PCR (qPCR) analysis of cultures of clonally derived airway epithelial cells. Airway basal cells express integrin subunit α 6 (ITGA6), keratin 5 (KRT5), cadherin 1 (CDH1) and TP63. Expression is shown in clonally derived cell cultures (n = 13 from 3 donors, coloured blue, green and orange) compared to control bulk human bronchial epithelial cell cultures (HBECs) that were expanded in the same culture conditions and lung fibroblast cell cultures (lung fibs) that served as a negative control. The centre values and error bars indicate mean and s.e.m., respectively. Conditions in which no expression was detected are shown as 0. c, Colony-forming efficiency of CD45CD31EPCAM+ cells after single-cell sorting from endobronchial biopsy samples (n = 16). For one ex-smoker, EpCAM was not used to select cells and only CD45CD31 cells were sorted; as expected, this was the patient with the lowest colony-forming efficiency.

Extended Data Fig. 2 Quality assurance of mutation calls.

a, Stacked bar chart showing the proportion of reads attributed to the human genome, mouse genome, both, neither, or with ambiguous mapping for the pure mouse fibroblast feeder line (left) or a pure human sample (right), assessed with the Xenome pipeline. b, Clean-up of mutation calls using the Xenome pipeline for one of the samples that was more heavily contaminated by the mouse feeder layer. The Venn diagram on the left shows the overlap in mutation calls before and after removing non-human reads by Xenome. c, Histograms of VAFs for two representative colonies in the sample set. The plot on the left shows a tight distribution around 50%, as expected for a colony derived from a single cell without contamination. The plot on the right shows a bimodal distribution with one peak at 50% (mutations present in the original basal cell) and a second peak at around 25% (probably representing mutations that were acquired in vitro during colony expansion). These second peaks at less than 50% are more evident in colonies from children, owing to the low number of mutations in the original basal cell. d, Histogram of VAFs for a colony seeded by more than one basal cell, leading to a peak at much less than 50%. e, Estimated sensitivity of mutation calling according to sequencing depth. Heterozygous germline polymorphisms were identified in each subject; for each colony sequenced, we calculated the fraction of these polymorphisms that was recalled by our algorithms. f, Comparison of mutational burden in normal bronchial epithelial cells that neighbour a carcinoma in situ (CIS) versus cells distant from the CIS in five patients. The box-and-whisker plots show the distribution of mutational burden per colony within each subject, with the boxes indicating median and interquartile range and the whiskers denoting the range. The overlaid points are the observed mutational burden of individual colonies.

Extended Data Fig. 3 Colonies with a near-normal mutational burden.

a, Density distribution of mutational burden in cells from ex-smokers (green) and current smokers (purple). The black vertical line shows the threshold for near-normal mutational burden derived for each patient. The x axis is on a logarithmic scale. Note the frequently bimodal distribution of mutational burden, especially in the ex-smokers, with the modes separated at the threshold for near-normal mutational burden. b, Flow cytometric analysis of clones for expression of KRT5, EpCAM, ITGA6, podoplanin (PDPN), NGFR and CD45 or CD31. Lung fibroblasts are included as a comparison. Fluorescence minus one (FMO) is shown. Plots for one clone with a near-normal mutational burden (low-mutant clone) and one with an increased burden (high-mutant clone) are shown, and are representative of five clones from one patient. c, Bright-field images of expanded clones at passage 3, showing cobblestone epithelial morphology. Images are representative of five clones from one patient. A clone with an increased mutational burden is shown at the top, and a clone from an ex-smoker with a near-normal mutational burden is shown at the bottom. For the left images, the magnification is ×10 and the scale bar is 200 μm; for the right images, the magnification is ×20 and the scale bar is 100 μm.

Extended Data Fig. 4 Indels, copy-number changes and structural variants in normal bronchial epithelial cells.

a, Relationship of burden of indels per cell with age. The points represent individual colonies (n = 632) and are coloured by smoking status. The black line represents the fitted effect of age on indel burden, which was estimated from LME models after correction for smoking status and within-patient correlation structure. The blue shaded area represents the 95% CI for the fitted line. b, Stacked bar plot showing the distribution of colonies with 0–7 copy-number changes and structural variants across the 16 subjects. c, Three examples of chromoplexy in normal bronchial cells. Structural variants are shown as coloured arcs that join two positions in the genome around the circumference. The instances of chromoplexy all consist of three translocations (purple). d, An example of chromothripsis in a cell from an 11-month old child. The plot on the right shows the copy number of genomic windows in the relevant region of chromosome 1 (black points); the lines and arcs denote the positions of observed structural variants.

Extended Data Fig. 5 Comparison of mutational signatures that were extracted using two algorithms.

a, Trinucleotide contexts for the signatures extracted by the hierarchical Dirichlet process (HDP) (left) and MutationalPatterns non-negative matrix factorization (right). The six substitution types are shown across the top of each signature. Within each signature, the trinucleotide context is shown as four sets of four bars, grouped by whether an A, C, G or T respectively is 5′ to the mutated base, and within each group of four by whether A, C, G or T is 3′ to the mutated base (the order of bars is the same as that shown in Fig. 2b). Where signatures show high cosine similarity scores between algorithms, they are lined up horizontally. We note that Signature C in MutationalPatterns does not have a match in the signatures extracted by the HDP algorithm, but appears very similar to Signature A in MutationalPatterns (or SBS-5 from the HDP). This means that it probably represents over-splitting of the signatures. b, Heat map showing the cosine similarities of signatures extracted by MutationalPatterns with those extracted by the HDP. Only cosine-similarity scores that are greater than 0.75 are coloured. c, Scatter plots showing the fraction of mutations in each colony (n = 632) assigned to each signature by the HDP algorithm (x axis) versus the MutationalPatterns algorithm (y axis). The correlation values quoted are Pearson’s correlation coefficients (R2). d, Transcriptional strand bias of A>G mutations in an N[A]T context before and after TSSs. Note the absence of transcriptional strand bias in intergenic regions but evidence for both transcription-coupled damage and repair after the TSS, applying similarly in both never-smokers and ex- or current smokers.

Extended Data Fig. 6 Phylogenetic trees of 13 subjects.

Phylogenetic trees showing clonal relationships among normal bronchial cells in the 13 subjects not shown in Fig. 3a. Branch lengths are proportional to the number of mutations (x axis) specific to that clone or subclone. Each branch is coloured by the proportion of mutations on that branch that are attributed to the various SBS signatures.

Extended Data Fig. 7 Indel signatures in the sample set.

a, Five indel signatures (ID-1, ID-2, ID-3, ID-5 and ID-8) were extracted by the HDP. The contributions of different types of indels to each signature are shown, grouped by whether variants are deletions or insertions; the size of the event; whether they occur at repeat units; and the sequence content of the indel. b, Stacked bar plot showing the proportional contribution of mutational signatures to indels across the 632 colonies derived from normal bronchial cells, extracted using the HDP. Within each patient, colonies are sorted from left to right by increasing indel burden (bar chart in dark grey above coloured signature-attribution stacks).

Extended Data Fig. 8 DBS signatures in the sample set.

a, Six DBS signatures were extracted by the HDP. The contributions of different types of double-base substitution to each signature are shown, grouped by the sequence that is mutated and by what it is mutated to. Five of the signatures have been observed in cancer genomes24, and one (DBS Sig-C) is a novel signature that was extracted here. b, Stacked bar plot showing the proportional contribution of mutational signatures to double-base substitutions across the 632 normal bronchial cells, extracted using the HDP. Note that some of the colonies in children have no double-base substitutions. Within each patient, colonies are sorted from left to right by increasing burden of double-base substitutions (bar chart in dark grey above coloured signature-attribution stacks).

Extended Data Fig. 9 Driver mutations in normal bronchial epithelium.

a, Stick plots showing distribution of mutations in TP53, NOTCH1 and other genes that were significantly mutated in our sample set. Mutations are coloured by type. The gene structure is shown horizontally in the centre of each plot, with domains as coloured bars. Above the gene are mutations in this sample set, and below the gene are mutations found in squamous cell carcinomas from the TCGA sample set. b, Fraction of cells with driver mutations in TP53 (left), NOTCH1 (middle) or all other significant cancer genes (right), split by smoking status.

Extended Data Fig. 10 Relationship of telomere length with age.

Scatter plot of estimated telomere lengths (y axis) against the age of the subject (x axis). Individual points represent colonies (n = 398 colonies in which less than 10% of the DNA was derived from the mouse feeder layer). Cells with a near-normal mutational burden are coloured gold.

Supplementary information

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-5.

Supplementary Data

This zip file contains Supplementary Code which shows code in HTML format and relevant source data. Statistical_analyses_bronchial_epithelium.html: HTML file containing embedded code, description and output and Lung_organoids_telomeres_with_contamination_20190408.txt: Tab-delimited text file containing source data for the Supplementary Code.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yoshida, K., Gowers, K.H.C., Lee-Six, H. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). https://doi.org/10.1038/s41586-020-1961-1

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.