Somatic mutations that accumulate in normal tissues are associated with ageing and disease1,2. Here we performed a comprehensive genomic analysis of 1,737 morphologically normal tissue biopsies of 9 organs from 5 donors. We found that somatic mutation accumulations and clonal expansions were widespread, although to variable extents, in morphologically normal human tissues. Somatic copy number alterations were rarely detected, except for in tissues from the oesophagus and cardia. Endogenous mutational processes with the SBS1 and SBS5 mutational signatures are ubiquitous among normal tissues, although they exhibit different relative activities. Exogenous mutational processes operate in multiple tissues from the same donor. We reconstructed the spatial somatic clonal architecture with sub-millimetre resolution. In the oesophagus and cardia, macroscopic somatic clones that expanded to hundreds of micrometres were frequently seen, whereas in tissues such as the colon, rectum and duodenum, somatic clones were microscopic in size and evolved independently, possibly restricted by local tissue microstructures. Our study depicts a body map of somatic mutations and clonal expansions from the same individual.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Integrated cohort of esophageal squamous cell cancer reveals genomic features underlying clinical characteristics
Nature Communications Open Access 07 September 2022
Nature Communications Open Access 23 August 2022
Nature Biotechnology Open Access 20 June 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The raw WES and WGS data generated in this study have been deposited in the European Genome-phenome Archive (EGA) (https://ega-archive.org) with accession number EGAD00001007859 and the Genome Sequence Archive (GSA) of the Beijing Institute of Genomics with accession number HRA000356 (https://ngdc.cncb.ac.cn/gsa-human). To gain access to the raw sequencing data, please submit requests to the Pan-body Mutagenesis Data Access Committee (EGA accession number EGAC00001002218) or through the GSA online page of this study (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA000356). All somatic mutations detected from WES with functional annotations and allele count information can be found in Supplementary Table 3. RefSeq database: https://www.ncbi.nlm.nih.gov/refseq. NHLBI Exome Sequencing Project: http://evs.gs.washington.edu/EVS. dbSNP database: https://www.ncbi.nlm.nih.gov/snp. COSMIC database: https://cancer.sanger.ac.uk/cosmic. The GTEx project: https://gtexportal.org/home.
Mutational signature analysis was performed using the HDP R package v.0.1.5 (https://github.com/nicolaroberts/hdp). Code for mutational signature analysis was adapted from https://github.com/HLee-Six/colon_microbiopsies. Code for the Bayesian Dirichlet process clustering of MCFs was adapted from https://github.com/sfbrunner/liver-pub-repo. Adapted code is available at Zenodo (https://doi.org/10.5281/zenodo.5012918). Driver gene analysis was performed using the dNdScv v0.01 (https://github.com/im3sanger/dndscv).
Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).
Risques, R. A. & Kennedy, S. R. Aging and the rise of somatic cancer-associated mutations in normal tissues. PLoS Genet. 14, e1007108 (2018).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Tang, J. et al. The genomic landscapes of individual melanocytes from human skin. Nature 586, 600–605 (2020).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).
Li, R. et al. Macroscopic somatic clonal expansion in morphologically normal human urothelium. Science 370, 82–89 (2020).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).
Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).
Garcia-Nieto, P. E., Morrison, A. J. & Fraser, H. B. The somatic mutation landscape of the human body. Genome Biol. 20, 298 (2019).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Poon, S. L. et al. Mutation signatures implicate aristolochic acid in bladder cancer development. Genome Med. 7, 38 (2015).
Ng, A. W. T. et al. Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia. Sci. Transl. Med. 9, eaan6446 (2017).
Du, Y. et al. Mutagenic factors and complex clonal relationship of multifocal urothelial cell carcinoma. Eur. Urol. 71, 841–843 (2017).
Chen, C. H. et al. Aristolochic acid-induced upper tract urothelial carcinoma in Taiwan: clinical characteristics and outcomes. Int. J. Cancer 133, 14–20 (2013).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature, https://doi.org/10.1038/s41586-021-03822-7 (2021).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684 (2020).
The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Roberts, N. D. Patterns of Somatic Genome Rearrangement In Human Cancer. PhD thesis, Univ. Cambridge (2018).
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–745 (2016).
Cancer Genome Atlas Research, N. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
The Cancer Genome Atlas Research Network. Integrated genomic characterization of oesophageal carcinoma. Nature 541, 169–175 (2017).
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
The Cancer Genome Atlas Research Network. Comprehensive and integrative genomic characterization of hepatocellular carcinoma. Cell 169, 1327–1341 (2017).
Wu, G., Feng, X. & Stein, L. A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11, R53 (2010).
Papastamoulis, P. label. switching: an R package for dealing with the label switching problem in MCMC outputs. J. Stat. Softw. 69, Code Snippet 1 (2016).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
We thank all the donors and their families for their consent and participation in this study. This project was jointly supported by the National Natural Science Foundation of China (81725015 to C.W., 81988101 to D.L. and C.W., 22050004 to J.W. and 22050002 to Y.H.); Beijing Outstanding Young Scientist Program (BJJWZYJH01201910023027 to C.W.); the Medical and Health Technology Innovation Project of the Chinese Academy of Medical Sciences (2019-I2M-2-001 to D.L. and C.W.); the National Key R&D Program of China (2019YFC1315702, 2018ZX10302205 to F.B. and 2018YFA0108100 to Y.H.); Guangdong Province Key Research and Development Program (2019B020226002 to F.B.); Beijing Municipal Science and Technology Commission (Z201100005320016 to Y.H.); Beijing Advanced Innovation Center for Genomics; and Shenzhen Bay Laboratory.
The authors declare no competing interests.
Peer review information Nature thanks Ziyue Gao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Representative H&E-stained samples showing the histological features of normal tissues sampled from nine organs from the five donors. Blanks in the figure represent samples that are not available in corresponding organs and donors. Scale bars, 100 µm.
a, Bar plot showing the overlap of mutations detected from WES and WGS of 43 samples. b, Adjusted numbers of somatic mutations detected in the coding regions in tissue biopsies from the organs of five donors. Red vertical bars represent median mutation numbers and grey horizontal bars represent standard deviations. c, The mutation burdens (after the sensitivity correction) in samples with median VAFs between 0.08 and 0.14. Top, box plots showing the mutation burdens in organs from different donors. The lower edge, upper edge and centre of the box represent the 25th (Q1) percentile, 75th (Q3) percentile and the median, respectively. IQR = Q3 – Q1. Outliers are values beyond the whiskers (upper, Q3 + 1.5 × IQR; lower, Q1 − 1.5 × IQR). Detailed information about the box plots can be found in Supplementary Table 3. Bottom, dot plots showing the adjusted mutation burdens in different organs. Red bars represent the medians. d, Scatter plots showing the VAFs of somatic mutations detected in the normal tissues from the nine organs of the five donors. Dots are coloured by mutation type.
Extended Data Fig. 3 Correlations and interdependence between VAF distributions and mutation numbers.
In each tissue, we calculated the first quantile (Q1) and third quantile (Q3) of the VAF and mutation burden distribution. We defined IQR = Q3 – Q1 and considered samples with a median VAF or mutation burden greater than Q3 + 1.5 × IQR or less than Q1 – 1.5 × IQR as outliers. We excluded these outliers in this analysis. Corr., correlation. The error bands represent the 95% confidence intervals. P values are from two-sided correlation tests based on the Pearson correlation coefficient.
a, Bar plots showing the number of mutations among the four intervals. Genes are divided into four intervals according to the tissue-specific gene expression levels. b, Heat maps showing somatic CNAs detected in the normal tissues from the nine organs of the five donors. Sex chromosomes were excluded.
a, t-stochastic neighbour embedding (t-SNE) plots of the trinucleotide mutational spectra of biopsy samples from each donor, broken down by organ and donor. Only biopsy samples with more than 30 SNVs were included. b, Heat map showing the clustering of cosine similarities of the trinucleotide mutational context in different samples. Colour bars above indicate information of donors and tissue types. c, Trinucleotide mutational spectra for the unassigned signature and the seven signatures extracted using a Bayesian hierarchical Dirichlet process. The bars represent means (95% credible intervals) of the 96 trinucleotide contexts. d, Heat map depicting the cosine similarities between extracted mutational signatures and mutational signatures from COSMIC and PCAWG catalogues. Cosine similarities between the seven extracted mutational signatures and their most similar comparators are highlighted. e, Stacked bar plots showing the number of mutations that are caused by different mutational signatures.
a, Transcriptional strand asymmetries across 96 mutation contexts for SBS4 and SBS22. Bar plots show the sum of assignment probabilities across trinucleotide contexts, split by whether the pyrimidine is on the template or coding strand. b, Transcriptional strand asymmetries across 96 mutation contexts for SBS4 and SBS22. Only mutations with an assignment probability greater than 0.5 are included. c, Trinucleotide mutational spectra of liver, oesophagus, duodenum and colon from donor PN1. Purple dots represent data points of the five tissue layers. Data are mean + s.d. Typical aristolochic-acid -associated mutational features are shaded in blue.
a, The 96 mutation context profiles in two oesophagus samples from donor PN7 (top) and two liver samples from donor PN9 (bottom) based on somatic mutations detected from WGS. b, Trinucleotide mutational spectra of two dissected duodenum layers from donor PN7. Typical aristolochic-acid -associated mutational features are shaded in blue. c, H&E stained liver tissue (PN9 layer 2 to 4) with superimposed donut charts showing the proportional contributions of mutational signatures, as estimated by deconstructSigs. Scale bars, 200 µm.
a, Mutational landscape of the 32 putative driver genes across different organs from the 5 donors. b, The functional interaction (network of the 32 driver genes. Driver genes are in blue nodes and linker genes (those not significantly mutated but highly connected to driver genes in the network) are in pink nodes. c, Significantly enriched pathways of the 32 driver genes. The vertical red line marks a false discovery rate (FDR) of 0.01. d, Bar plot showing the numbers of total mutations and cancer hotspot mutations in driver genes. The percentages of hotspot mutations are labelled on the top of the bar plot. e, Fraction of driver mutations that are private or shared by more than one biopsy sample. f, Heat maps showing the ratio of the numbers of observed to expected (O/E) driver mutations across different organs (left) and the P values for the enrichment (right). P values from one-sided hypergeometric tests. g, Bar plots comparing the number of mutations in gastric cancer top-10 most frequently mutated driver genes in TCGA with normal stomach and cardia samples in this study. Adjustment for multiple comparisons was performed. Adjusted P values (q-value) are labelled.
Bubble plots show the correlations between average MCFs and mutational burdens in biopsy samples across different organs in donors PN1, PN2, PN7, PN8 and PN9.
a, Phylogenetic tree depicting the clonal relationships of the biopsy samples of the oesophagus of donor PN9. b, Heat maps show mutation clustering, spatial clonal architecture and potential driver mutations or CNAs in samples from the oesophagus. Scale bars, 800 µm. c, Heat maps showing potential driver mutations and CNAs in oesophagus samples from donor PN9.
a, Heat map showing the mutation clustering in liver samples from donor PN9. b, Spatial clonal architecture of liver tissue from donor PN9. The numbers in each layer represent the positions of LCM biopsy samples. The overlaid colours correspond to a and indicate the ranges of clonal expansions. c, Heat maps show mutation clustering in samples from the representative organs. Each cluster contains mutations with similar MCFs.
a, Phylogenetic tree depicting the clonal relationships of colon biopsy samples from donor PN9. b, Heat maps showing clustered mutations in samples from representative organs. Each cluster contains mutations with similar MCFs.
This file contains the Supplementary Discussion, including additional discussions about the sampling strategy, mutation burden, clonal expansion patterns, mutational signatures, and copy number alterations in the current study.
Clinical information for the five donors in the current study.
Whole-exome sequencing information. Nomenclature of sample IDs: for example, ‘PN1E-1-2’ represents the number 2 oesophagus sample dissected from tissue layer 1 from donor PN1.
Somatic mutations with functional annotations and allele count information detected from whole-exome sequencing. Detailed information for box plots in Fig. 1c and Extended Data Fig. 2c.
Low-depth and high-depth whole-genome sequencing information and copy number analysis.
Input matrix for mutational signature analysis using HDP. SNVs detected in both coding and non-coding regions from the whole-exome sequencing data are included. Mutations detected from normal biopsy samples from each dissected layer are merged.
Extracted mutational signatures and their cosine similarities to the known signatures.
Relative activates of extracted mutational signatures.
Driver gene candidates from previous studies (126 gene list).
Cancer hotspot mutations detected in normal tissues in the current study.
P values for driver gene enrichment in different tissues. P values were calculated from one-sided hypergeometric tests.
About this article
Cite this article
Li, R., Di, L., Li, J. et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 597, 398–403 (2021). https://doi.org/10.1038/s41586-021-03836-1
This article is cited by
Nature Reviews Nephrology (2022)
Nature Reviews Cancer (2022)
Nature Genetics (2022)
Nature Communications (2022)
Nature Reviews Genetics (2022)