The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6–9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5–50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Jacobs, K. B. et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651–658 (2012).

  2. 2.

    Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642–650 (2012).

  3. 3.

    Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

  4. 4.

    Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

  5. 5.

    Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

  6. 6.

    McKerrell, T. et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Reports 10, 1239–1245 (2015).

  7. 7.

    Machiela, M. J. et al. Characterization of large structural genetic mosaicism in human autosomes. Am. J. Hum. Genet. 96, 487–497 (2015).

  8. 8.

    Vattathil, S. & Scheet, P. Extensive hidden genomic mosaicism revealed in normal tissue. Am. J. Hum. Genet. 98, 571–578 (2016).

  9. 9.

    Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

  10. 10.

    Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease — clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).

  11. 11.

    Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).

  12. 12.

    Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377, 111–121 (2017).

  13. 13.

    Acuna-Hidalgo, R. et al. Ultra-sensitive sequencing identifies high prevalence of clonal hematopoiesis-associated mutations throughout adult life. Am. J. Hum. Genet. 101, 50–64 (2017).

  14. 14.

    Laken, S. J. et al. Familial colorectal cancer in Ashkenazim due to a hypermutable tract in APC. Nat. Genet. 17, 79–83 (1997).

  15. 15.

    Jones, A. V. et al. JAK2 haplotype is a major risk factor for the development of myeloproliferative neoplasms. Nat. Genet. 41, 446–449 (2009).

  16. 16.

    Kilpivaara, O. et al. A germline JAK2 SNP is associated with predisposition to the development of JAK2(V617F)-positive myeloproliferative neoplasms. Nat. Genet. 41, 455–459 (2009).

  17. 17.

    Olcaydu, D. et al. A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nat. Genet. 41, 450–454 (2009).

  18. 18.

    Koren, A. et al. Genetic variation in human DNA replication timing. Cell 159, 1015–1026 (2014).

  19. 19.

    Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).

  20. 20.

    Hinds, D. A. et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood 128, 1121–1128 (2016).

  21. 21.

    Wright, D. J. et al. Genetic variants associated with mosaic Y chromosome loss highlight cell cycle genes and overlap with cancer susceptibility. Nat. Genet. 49, 674–679 (2017).

  22. 22.

    Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  23. 23.

    Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

  24. 24.

    Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

  25. 25.

    Machiela, M. J. et al. Mosaic chromosome 20q deletions are more frequent in the aging population. Blood Advances 1, 380–385 (2017).

  26. 26.

    Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

  27. 27.

    Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

  28. 28.

    Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

  29. 29.

    Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

  30. 30.

    Machiela, M. J. et al. Female chromosome X mosaicism is age-related and preferentially affects the inactivated X chromosome. Nat. Commun. 7, 11843 (2016).

  31. 31.

    Sinclair, E. J., Potter, A. M., Watmore, A. E., Fitchett, M. & Ross, F. Trisomy 15 associated with loss of the Y chromosome in bone marrow: a possible new aging effect. Cancer Genet. Cytogenet. 105, 20–23 (1998).

  32. 32.

    Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

  33. 33.

    Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

  34. 34.

    Sutherland, G. R., Baker, E. & Seshadri, R. S. Heritable fragile sites on human chromosomes. V. A new class of fragile site requiring BrdU for expression. Am. J. Hum. Genet. 32, 542–548 (1980).

  35. 35.

    Hewett, D. R. et al. FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. Mol. Cell 1, 773–781 (1998).

  36. 36.

    Richards, R. I. & Sutherland, G. R. Dynamic mutations: a new class of mutations causing human disease. Cell 70, 709–712 (1992).

  37. 37.

    Gurney, A. L., Carver-Moore, K., de Sauvage, F. J. & Moore, M. W. Thrombocytopenia in c-mpl-deficient mice. Science 265, 1445–1447 (1994).

  38. 38.

    Tefferi, A. Novel mutations and their functional and clinical relevance in myeloproliferative neoplasms: JAK2, MPL, TET2, ASXL1, CBL, IDH and IKZF1. Leukemia 24, 1128–1138 (2010).

  39. 39.

    Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244–248 (2017).

  40. 40.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

  41. 41.

    Oddsson, A. et al. The germline sequence variant rs2736100_C in TERT associates with myeloproliferative neoplasms. Leukemia 28, 1371–1374 (2014).

  42. 42.

    Stacey, S. N. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011).

  43. 43.

    Rawstron, A. C. et al. Monoclonal B-cell lymphocytosis and chronic lymphocytic leukemia. N. Engl. J. Med. 359, 575–583 (2008).

  44. 44.

    Landgren, O. et al. B-cell clones as early markers for chronic lymphocytic leukemia. N. Engl. J. Med. 360, 659–667 (2009).

  45. 45.

    Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

  46. 46.

    Ojha, J. et al. Monoclonal B-cell lymphocytosis is characterized by mutations in CLL putative driver genes and clonal heterogeneity many years before disease progression. Leukemia 28, 2395–2398 (2014).

  47. 47.

    Berndt, S. I. et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat. Commun. 7, 10933 (2016).

  48. 48.

    O’Keefe, C., McDevitt, M. A. & Maciejewski, J. P. Copy neutral loss of heterozygosity: a novel chromosomal lesion in myeloid malignancies. Blood 115, 2731–2739 (2010).

  49. 49.

    Chase, A. et al. Profound parental bias associated with chromosome 14 acquired uniparental disomy indicates targeting of an imprinted locus. Leukemia 29, 2069–2074 (2015).

  50. 50.

    Choate, K. A. et al. Mitotic recombination in patients with ichthyosis causes reversion of dominant mutations in KRT10. Science 330, 94–97 (2010).

  51. 51.

    Peiffer, D. A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

  52. 52.

    Diskin, S. J. et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 36, e126 (2008).

  53. 53.

    Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

  54. 54.

    Vattathil, S. & Scheet, P. Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome Res. 23, 152–158 (2013).

  55. 55.

    Genovese, G., Leibon, G., Pollak, M. R. & Rockmore, D. N. Improved IBD detection using incomplete haplotype information. BMC Genet. 11, 58 (2010).

  56. 56.

    Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).

  57. 57.

    Pique-Regi, R., Cáceres, A. & González, J. R. R-Gada: a fast and flexible pipeline for copy number analysis in association studies. BMC Bioinformatics 11, 380 (2010).

  58. 58.

    Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).

  59. 59.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

  60. 60.

    Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

  61. 61.

    Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

  62. 62.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

  63. 63.

    Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

  64. 64.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  65. 65.

    Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).

  66. 66.

    Turner, J. J. et al. InterLymph hierarchical classification of lymphoid neoplasms for epidemiologic research based on the WHO classification (2008): update and future directions. Blood 116, e90–e98 (2010).

  67. 67.

    Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).

  68. 68.

    Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

  69. 69.

    Dumanski, J. P. et al. Mutagenesis. Smoking is associated with mosaic loss of chromosome Y. Science 347, 81–83 (2015).

Download references


We thank Y. Jakubek for assistance with follow-up on del(10q) events8 and G. Bhatia, A. Gusev, M. Lipson, X. Liu, L. O’Connor, N. Patterson, and B. van de Geijn for discussions. This research was conducted using the UK Biobank Resource under Application #19808. A.L.P. was supported by NIH grants R01 HG006399, R01 GM105857, R01 MH101244, and R21 HG009513. P.-R.L. was supported by NIH fellowship F32 HG007805, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, and the Next Generation Fund at the Broad Institute of MIT and Harvard. G.G., R.E.H., and S.A.M. were supported by NIH grant R01 HG006855 and the the Stanley Center for Psychiatric Research. H.K.F. was supported by the Fannie and John Hertz Foundation. Y.A.R. was supported by NIH award T32 GM007753, a National Defense Science and Engineering Graduate Fellowship, and the Paul and Daisy Soros Foundation. S.F.B. and G.G. were supported by US Department of Defense Breast Cancer Research Breakthrough Awards W81XWH-16-1-0315 and W81XWH-16-1-0316 (project BC151244). S.F.B. was supported by the Elsa U. Pardee Foundation and NCI MSKCC Cancer Center Core Grant P30 CA008748. M.E.T. was supported, in part, by NIH grants UM1 HG008900 and R01 HD081256. Computational analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School, which is partially supported by grant NCRR 1S10RR028832-01, and on the Genetic Cluster Computer (http://www.geneticcluster.org) hosted by SURFsara and financially supported by the Netherlands Scientific Organization (NWO 480-05-003 PI: Posthuma) along with a supplement from the Dutch Brain Foundation and the VU University Amsterdam. This work was supported by a grant from the Simons Foundation (SFARI Awards #346042 and #385027, M.E.T.). We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We appreciate access to genetic and phenotypic data on SFARI Base.

Reviewer information

Nature thanks S. Chanock, D. Conrad, I. Hall and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Po-Ru Loh, Giulio Genovese

  2. These authors jointly supervised this work: Steven A McCarroll, Alkes L Price


  1. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

    • Po-Ru Loh
  2. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Po-Ru Loh
    • , Giulio Genovese
    • , Robert E. Handsaker
    • , Hilary K. Finucane
    • , Michael E. Talkowski
    • , Steven A. McCarroll
    •  & Alkes L. Price
  3. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Giulio Genovese
    • , Robert E. Handsaker
    • , Michael E. Talkowski
    •  & Steven A. McCarroll
  4. Department of Genetics, Harvard Medical School, Boston, MA, USA

    • Giulio Genovese
    • , Robert E. Handsaker
    •  & Steven A. McCarroll
  5. Schmidt Fellows Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Hilary K. Finucane
  6. Department of Computer Science, Harvard University, Cambridge, MA, USA

    • Yakir A. Reshef
  7. Department of Statistics, University of Oxford, Oxford, UK

    • Pier Francesco Palamara
  8. Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

    • Brenda M. Birmann
  9. Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA

    • Michael E. Talkowski
  10. Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA

    • Michael E. Talkowski
  11. Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA

    • Samuel F. Bakhoum
  12. Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA

    • Samuel F. Bakhoum
  13. Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    • Alkes L. Price


  1. Search for Po-Ru Loh in:

  2. Search for Giulio Genovese in:

  3. Search for Robert E. Handsaker in:

  4. Search for Hilary K. Finucane in:

  5. Search for Yakir A. Reshef in:

  6. Search for Pier Francesco Palamara in:

  7. Search for Brenda M. Birmann in:

  8. Search for Michael E. Talkowski in:

  9. Search for Samuel F. Bakhoum in:

  10. Search for Steven A. McCarroll in:

  11. Search for Alkes L. Price in:


P.-R.L., G.G., S.F.B., S.A.M., and A.L.P. designed the study. P.-R.L. and G.G. analysed UK Biobank data. R.E.H. analysed SSC data. P.-R.L., G.G., H.K.F., and Y.A.R. developed statistical methods. P.F.P. assisted with IBD analyses. B.M.B. assisted with cancer phenotype curation. M.E.T. and S.A.M. supervised SSC analyses. All authors wrote the paper.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Po-Ru Loh or Giulio Genovese or Steven A. McCarroll or Alkes L. Price.

Extended data figures and tables

  1. Extended Data Fig. 1 Examples of mosaic events called using phased genotyping intensities.

    ac, UK Biobank mCA sample 2791 has a mosaic deletion of chr13 from approximately 31–53 Mb that cannot be confidently called from unphased BAF and LRR data (a, c). However, the existence of an event is evident in the phased BAF data (b), and the regional decrease in LRR indicates that this event is a deletion. In b, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 25 heterozygous sites; the same bins are used for c. Error bars, s.e.m. df, Sample 1645 has a mosaic CNN-LOH on chr9p from the 9p telomere to about 26 Mb that cannot be confidently called from unphased BAF data (d) but is evident in phased BAF data (e). A phase switch error causes a sign flip in phased BAF at approximately 20 Mb. The lack of a shift in LRR in the region (f) indicates that this event is a CNN-LOH. In e, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 50 heterozygous sites; the same bins are used for f. Error bars, s.e.m. gi, Sample 2464 has a full-chromosome mosaic event on chr12 that cannot be confidently called from unphased BAF and LRR data (g, i) but is evident in phased BAF data (h). Several phase switch errors cause sign flips in phased BAF across chr12. The slight positive shift in mean LRR (i) indicates that this event is most likely to be a mosaic gain of chr12. In h, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 50 heterozygous sites; the same bins are used for i. Error bars, s.e.m.

  2. Extended Data Fig. 2 Estimation of true FDR using age distributions of individuals with mCA calls.

    We generated age distributions for (i) ‘high-confidence’ detected events passing a permutation-based FDR threshold of 0.01 (bright red); (ii) ‘medium-confidence’ events below the FDR threshold of 0.01 but passing an FDR threshold of 0.05 (darker red); and (iii) ‘low-confidence’ events below the FDR threshold of 0.05 but passing an FDR threshold of 0.10 (darkest red; not analysed but plotted for context). We compared these distributions to the overall age distribution of UK Biobank participants (grey). On the basis of the numbers of events in each category, approximately 20% of medium-confidence detected events are expected to be false positives. To estimate our true FDR, we regressed the medium-confidence age distribution on the high-confidence and overall age distributions, reasoning that the medium-confidence age distribution should be a mixture of correctly called events with age distribution similar to that of the high-confidence events, and spurious calls with age distribution similar to the overall cohort. We observed a regression weight of 0.31 for the component corresponding to spurious calls, in good agreement with expectation, and implying a true FDR of 7.5% (6.2–8.8%, 95% CI based on regression fit on n = 6 age bins).

  3. Extended Data Fig. 3 Clonal cell fractions of co-occurring events generally suggest co-existence within the same cell population.

    For each pair of significantly co-occurring events (Fig. 2b), we compared the clonal fractions of the two events within each individual that carried both events. Each point in the plots corresponds to an individual carrying the pair of events under consideration; individuals are colour-coded by the total number of events they carry. For nearly all pairs of events, the clonal fractions of the two events were very similar in most individuals carrying both events, suggesting that the events occurred in the same clonal cell population. A few exceptions do seem to exist; for example, 22q– versus 13q CNN-LOH cell fraction; here, the cell fractions suggest that 13q CNN-LOH events may be present in a subclone. This observation is consistent with acquired uniparental disomy of 13q providing a second hit within a del(13q14) clonal expansion, as we see in Extended Data Fig. 8. (We did not include del(13q14) vs. 13q CNN-LOH in this plot because inference of clonal fractions is complex for these overlapping events; see Extended Data Fig. 8.)

  4. Extended Data Fig. 4 Replication of previous association between JAK2 46/1 haplotype and 9p CNN-LOH in cis due to clonal selection.

    The common JAK2 46/1 haplotype has previously been shown to confer risk of somatic JAK2 V617F mutation such that subsequent 9p CNN-LOH produces a strong proliferative advantage15,16,17,18,20 (right). In our analysis, CNN-LOH on 9p is strongly associated with JAK2 46/1 (P = 1.6 × 10−13, OR = 2.7 (2.1–3.5); Fisher’s exact test on n = 120,664 individuals) with the risk haplotype predominantly duplicated by CNN-LOH in hets (52 of n = 61 heterozygous cases; binomial P = 1.8 × 10−8). Left, the genomic modification is illustrated in the top panel and association signals are plotted in the bottom. The lead associated variant is labelled, and variants are coloured according to linkage disequilibrium with the lead variant (scaled for readability).

  5. Extended Data Fig. 5 Evidence of multiple causal variants for 10q25.2 breakage and 1p CNN-LOH associations.

    a, Multiple expanded repeats at FRA10B drive breakage at 10q25.2. We identified 12 distinct primary repeat motifs at FRA10B in 26 whole-genome-sequenced individuals from 14 families (labelled VNTR-N-x, where N denotes length in base pairs); carriers of these repeats exhibit varying degrees of FRA10B repeat expansion (Supplementary Note 8). The repeat motifs are AT-rich and are similar to FRA10B repeats previously reported35. The alignment provided here includes the repeat motifs that were most frequently observed in FRA10B expanded alleles35 (E8, E13, E17, and E19) along with a few other closely related expanded repeat motifs (E10, E11, and E12). b, Carriers of the 10q terminal deletion in the UK Biobank share long haplotypes at 10q25.2 identical-by-descent. Square nodes in the IBD graph correspond to males and circles to females. Node size is proportional to cell fraction and edge weight increases with IBD length. Coloured nodes indicate imputed carriers of variable number tandem repeats (VNTRs) at FRA10B (Supplementary Table 7); colour intensity scales with imputed dosage. c, Identity-by-descent graph at MPL locus (chr1:43.8 Mb) on individuals with mCAs on chr1 extending to the p telomere. Colored nodes indicate imputed carriers of SNPs independently associated with mosaic 1p CNN-LOH (Fig. 4a).

  6. Extended Data Fig. 6 Germline CNVs at 15q26.3.

    a, Read depth profile plot of WGS samples in the terminal 700 kb of chr15q. Three individuals in one family carry an approximately 70-kb deletion at 15q26.3, and a fourth carries the same deletion along with an approximately 290-kb duplication (probably on the same haplotype, based on population frequencies of these events; see Extended Data Fig. 7). These four individuals (highlighted in blue) segregate with the rs182643535:T allele in the WGS cohort. Inset: the parental carrier in the family, individual 10921, has detectable mosaicism in two distinct 15q CNN-LOH subclones (one starting at 41.64 Mb with 4.6% cell fraction, the other starting at 71.64 Mb with an additional 2.0% cell fraction). b, Expanded read depth profile plot, with deletion-only individuals highlighted in blue and the del + dup individual highlighted in green. Breakpoint analysis indicates that the deletion spans chr15:102151467–102222161 and contains a 1,139-bp mid-segment (chr15:102164897–102166035) that is retained in inverted orientation. The duplication spans chr15:102026997–102314016.

  7. Extended Data Fig. 7 Mosaic chromosomal alterations and germline CNVs at 15q26.3.

    Using identified breakpoints of the germline 70-kb deletion and 290-kb duplication (Extended Data Fig. 6), we computed mean genotyping intensity (LRR) in UK Biobank samples within the 70-kb deletion region (24 probes) and within the flanking 220-kb region (97 probes). Individuals are plotted by flanking 220-kb mean LRR versus 70-kb mean LRR and coloured according to mosaic status for somatic 15q mCAs. UK Biobank samples carrying the 70-kb deletion, 290-kb duplication, and both (del+dup) are all easily identifiable in distinct clusters. The plot also appears to contain clusters with higher copy number. Of the three CNV-carrying alleles, the simple 70-kb deletion is the only one that predisposes to mCAs. Most mosaic events containing the 70-kb deletion are CNN-LOH events that make cells homozygous for the 70-kb deletion; two individuals have somatic loss of the homologous (normal) chromosome, making cells hemizygous for the 70-kb deletion.

  8. Extended Data Fig. 8 Phased BAF plots of chromosomes with multiple CNN-LOH subclones.

    All of the plots exhibit step functions of increasing |ΔBAF| towards a telomere, which is the hallmark of multiple clonal cell populations containing distinct CNN-LOH events that affect different spans of a chromosomal arm (all extending to the telomere). Distinct |ΔBAF| values (called using an HMM) are indicated with different colours. Flips in the sign of phased BAF usually correspond to phase switch errors. Two samples exhibit high switch error rates: 14q individual 3067 (explained by non-European ancestry), and 1p individual 23 (explained by very high |ΔBAF|; extreme shifts in genotyping intensities result in poor genotyping quality). All five individuals with multiple CNN-LOH events on chr13q appear to contain switch errors over 13q14, but these switches are actually explained by overlapping 13q14 deletions; see Supplementary Note 1 for detailed discussion.

  9. Extended Data Fig. 9 CLL prediction accuracy: receiver operating curves and precision-recall curves.

    CLL prediction benchmarks using tenfold stratified cross validation on: only individuals with lymphocyte counts in the normal range (1 × 109/L to 3.5 × 109/L), as in our primary analyses (n = 36 cases, 113,923 controls) (a, b); and individuals with any lymphocyte count (n = 78 cases, 118,481 controls) (c, d). a matches Fig. 5b, and b shows the precision-recall curve from the same analysis. c and d correspond to an analogous analysis in which we removed the restriction on lymphocyte count and also used additional mosaic event variables for prediction (11q–, 14q–, 22q–, and total number of autosomal events). In both benchmarks, individuals with previous cancer diagnoses or CLL diagnoses within 1 year of assessment were excluded; however, some individuals with very high lymphocyte counts pass this filter (and probably already had CLL at assessment despite being undiagnosed for more than 1 year), hence the difference in apparent prediction accuracy between the two benchmarks.

  10. Extended Data Fig. 10 Mosaic chromosomal alterations detected in CLL cases sorted by lymphocyte count.

    Individuals are stratified by cancer status at DNA collection (no previous diagnosis versus any previous diagnosis), and mCAs (red, loss; green, CNN-LOH; blue, gain; grey, undetermined) are plotted per chromosome as coloured rectangles (with height increasing with BAF deviation).

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Notes 1-9, Supplementary References and Supplementary Tables 1-16.

  2. Reporting Summary

  3. Supplementary Data

    This Excel file contains spreadsheets with individual-level mosaic event calls from this manuscript and previous studies of clonal hematopoiesis.

  4. Supplementary Data

    This zip archive contains BED-format UCSC Genome Browser tracks for event calls from this manuscript and previous studies of clonal hematopoiesis. The archive also contains a readme document describing how the files were generated.

About this article

Publication history




Issue Date




By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.