Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations

Abstract

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6–9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5–50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Mosaic chromosomal alterations detected in 151,202 UK Biobank participants.
Fig. 2: Distributional properties of detected mCAs.
Fig. 3: Repeat expansions at fragile site FRA10B driving breakage at 10q25.2.
Fig. 4: Novel loci associated with mCAs in cis due to clonal selection.
Fig. 5: Associations between mCAs and incident cancers and mortality.

References

  1. 1.

    Jacobs, K. B. et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651–658 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. 2.

    Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642–650 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. 3.

    Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. 5.

    Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. 6.

    McKerrell, T. et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Reports 10, 1239–1245 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. 7.

    Machiela, M. J. et al. Characterization of large structural genetic mosaicism in human autosomes. Am. J. Hum. Genet. 96, 487–497 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Vattathil, S. & Scheet, P. Extensive hidden genomic mosaicism revealed in normal tissue. Am. J. Hum. Genet. 98, 571–578 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. 10.

    Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease — clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).

    Article  PubMed  CAS  Google Scholar 

  11. 11.

    Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. 12.

    Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377, 111–121 (2017).

    Article  PubMed  Google Scholar 

  13. 13.

    Acuna-Hidalgo, R. et al. Ultra-sensitive sequencing identifies high prevalence of clonal hematopoiesis-associated mutations throughout adult life. Am. J. Hum. Genet. 101, 50–64 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. 14.

    Laken, S. J. et al. Familial colorectal cancer in Ashkenazim due to a hypermutable tract in APC. Nat. Genet. 17, 79–83 (1997).

    Article  PubMed  CAS  Google Scholar 

  15. 15.

    Jones, A. V. et al. JAK2 haplotype is a major risk factor for the development of myeloproliferative neoplasms. Nat. Genet. 41, 446–449 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. 16.

    Kilpivaara, O. et al. A germline JAK2 SNP is associated with predisposition to the development of JAK2(V617F)-positive myeloproliferative neoplasms. Nat. Genet. 41, 455–459 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. 17.

    Olcaydu, D. et al. A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nat. Genet. 41, 450–454 (2009).

    Article  PubMed  CAS  Google Scholar 

  18. 18.

    Koren, A. et al. Genetic variation in human DNA replication timing. Cell 159, 1015–1026 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. 19.

    Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Hinds, D. A. et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood 128, 1121–1128 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Wright, D. J. et al. Genetic variants associated with mosaic Y chromosome loss highlight cell cycle genes and overlap with cancer susceptibility. Nat. Genet. 49, 674–679 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. 22.

    Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. 24.

    Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. 25.

    Machiela, M. J. et al. Mosaic chromosome 20q deletions are more frequent in the aging population. Blood Advances 1, 380–385 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. 26.

    Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

    Article  PubMed  CAS  Google Scholar 

  27. 27.

    Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

    Article  PubMed  CAS  Google Scholar 

  28. 28.

    Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. 29.

    Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. 30.

    Machiela, M. J. et al. Female chromosome X mosaicism is age-related and preferentially affects the inactivated X chromosome. Nat. Commun. 7, 11843 (2016).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. 31.

    Sinclair, E. J., Potter, A. M., Watmore, A. E., Fitchett, M. & Ross, F. Trisomy 15 associated with loss of the Y chromosome in bone marrow: a possible new aging effect. Cancer Genet. Cytogenet. 105, 20–23 (1998).

    Article  PubMed  CAS  Google Scholar 

  32. 32.

    Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. 33.

    Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

    ADS  Article  PubMed  CAS  Google Scholar 

  34. 34.

    Sutherland, G. R., Baker, E. & Seshadri, R. S. Heritable fragile sites on human chromosomes. V. A new class of fragile site requiring BrdU for expression. Am. J. Hum. Genet. 32, 542–548 (1980).

    PubMed  PubMed Central  CAS  Google Scholar 

  35. 35.

    Hewett, D. R. et al. FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. Mol. Cell 1, 773–781 (1998).

    Article  PubMed  CAS  Google Scholar 

  36. 36.

    Richards, R. I. & Sutherland, G. R. Dynamic mutations: a new class of mutations causing human disease. Cell 70, 709–712 (1992).

    Article  PubMed  CAS  Google Scholar 

  37. 37.

    Gurney, A. L., Carver-Moore, K., de Sauvage, F. J. & Moore, M. W. Thrombocytopenia in c-mpl-deficient mice. Science 265, 1445–1447 (1994).

    ADS  Article  PubMed  CAS  Google Scholar 

  38. 38.

    Tefferi, A. Novel mutations and their functional and clinical relevance in myeloproliferative neoplasms: JAK2, MPL, TET2, ASXL1, CBL, IDH and IKZF1. Leukemia 24, 1128–1138 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. 39.

    Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244–248 (2017).

    ADS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. 41.

    Oddsson, A. et al. The germline sequence variant rs2736100_C in TERT associates with myeloproliferative neoplasms. Leukemia 28, 1371–1374 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. 42.

    Stacey, S. N. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. 43.

    Rawstron, A. C. et al. Monoclonal B-cell lymphocytosis and chronic lymphocytic leukemia. N. Engl. J. Med. 359, 575–583 (2008).

    Article  PubMed  CAS  Google Scholar 

  44. 44.

    Landgren, O. et al. B-cell clones as early markers for chronic lymphocytic leukemia. N. Engl. J. Med. 360, 659–667 (2009).

    Article  PubMed  CAS  Google Scholar 

  45. 45.

    Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. 46.

    Ojha, J. et al. Monoclonal B-cell lymphocytosis is characterized by mutations in CLL putative driver genes and clonal heterogeneity many years before disease progression. Leukemia 28, 2395–2398 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. 47.

    Berndt, S. I. et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat. Commun. 7, 10933 (2016).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. 48.

    O’Keefe, C., McDevitt, M. A. & Maciejewski, J. P. Copy neutral loss of heterozygosity: a novel chromosomal lesion in myeloid malignancies. Blood 115, 2731–2739 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. 49.

    Chase, A. et al. Profound parental bias associated with chromosome 14 acquired uniparental disomy indicates targeting of an imprinted locus. Leukemia 29, 2069–2074 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. 50.

    Choate, K. A. et al. Mitotic recombination in patients with ichthyosis causes reversion of dominant mutations in KRT10. Science 330, 94–97 (2010).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. 51.

    Peiffer, D. A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    Diskin, S. J. et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 36, e126 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. 53.

    Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. 54.

    Vattathil, S. & Scheet, P. Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome Res. 23, 152–158 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. 55.

    Genovese, G., Leibon, G., Pollak, M. R. & Rockmore, D. N. Improved IBD detection using incomplete haplotype information. BMC Genet. 11, 58 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. 56.

    Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).

    Article  PubMed  MATH  Google Scholar 

  57. 57.

    Pique-Regi, R., Cáceres, A. & González, J. R. R-Gada: a fast and flexible pipeline for copy number analysis in association studies. BMC Bioinformatics 11, 380 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. 58.

    Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. 59.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. 60.

    Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. 61.

    Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. 62.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. 64.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    ADS  Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. 65.

    Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. 66.

    Turner, J. J. et al. InterLymph hierarchical classification of lymphoid neoplasms for epidemiologic research based on the WHO classification (2008): update and future directions. Blood 116, e90–e98 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. 67.

    Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).

    Article  PubMed  CAS  Google Scholar 

  68. 68.

    Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. 69.

    Dumanski, J. P. et al. Mutagenesis. Smoking is associated with mosaic loss of chromosome Y. Science 347, 81–83 (2015).

    ADS  Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

We thank Y. Jakubek for assistance with follow-up on del(10q) events8 and G. Bhatia, A. Gusev, M. Lipson, X. Liu, L. O’Connor, N. Patterson, and B. van de Geijn for discussions. This research was conducted using the UK Biobank Resource under Application #19808. A.L.P. was supported by NIH grants R01 HG006399, R01 GM105857, R01 MH101244, and R21 HG009513. P.-R.L. was supported by NIH fellowship F32 HG007805, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, and the Next Generation Fund at the Broad Institute of MIT and Harvard. G.G., R.E.H., and S.A.M. were supported by NIH grant R01 HG006855 and the the Stanley Center for Psychiatric Research. H.K.F. was supported by the Fannie and John Hertz Foundation. Y.A.R. was supported by NIH award T32 GM007753, a National Defense Science and Engineering Graduate Fellowship, and the Paul and Daisy Soros Foundation. S.F.B. and G.G. were supported by US Department of Defense Breast Cancer Research Breakthrough Awards W81XWH-16-1-0315 and W81XWH-16-1-0316 (project BC151244). S.F.B. was supported by the Elsa U. Pardee Foundation and NCI MSKCC Cancer Center Core Grant P30 CA008748. M.E.T. was supported, in part, by NIH grants UM1 HG008900 and R01 HD081256. Computational analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School, which is partially supported by grant NCRR 1S10RR028832-01, and on the Genetic Cluster Computer (http://www.geneticcluster.org) hosted by SURFsara and financially supported by the Netherlands Scientific Organization (NWO 480-05-003 PI: Posthuma) along with a supplement from the Dutch Brain Foundation and the VU University Amsterdam. This work was supported by a grant from the Simons Foundation (SFARI Awards #346042 and #385027, M.E.T.). We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We appreciate access to genetic and phenotypic data on SFARI Base.

Reviewer information

Nature thanks S. Chanock, D. Conrad, I. Hall and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

P.-R.L., G.G., S.F.B., S.A.M., and A.L.P. designed the study. P.-R.L. and G.G. analysed UK Biobank data. R.E.H. analysed SSC data. P.-R.L., G.G., H.K.F., and Y.A.R. developed statistical methods. P.F.P. assisted with IBD analyses. B.M.B. assisted with cancer phenotype curation. M.E.T. and S.A.M. supervised SSC analyses. All authors wrote the paper.

Corresponding authors

Correspondence to Po-Ru Loh or Giulio Genovese or Steven A. McCarroll or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Examples of mosaic events called using phased genotyping intensities.

ac, UK Biobank mCA sample 2791 has a mosaic deletion of chr13 from approximately 31–53 Mb that cannot be confidently called from unphased BAF and LRR data (a, c). However, the existence of an event is evident in the phased BAF data (b), and the regional decrease in LRR indicates that this event is a deletion. In b, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 25 heterozygous sites; the same bins are used for c. Error bars, s.e.m. df, Sample 1645 has a mosaic CNN-LOH on chr9p from the 9p telomere to about 26 Mb that cannot be confidently called from unphased BAF data (d) but is evident in phased BAF data (e). A phase switch error causes a sign flip in phased BAF at approximately 20 Mb. The lack of a shift in LRR in the region (f) indicates that this event is a CNN-LOH. In e, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 50 heterozygous sites; the same bins are used for f. Error bars, s.e.m. gi, Sample 2464 has a full-chromosome mosaic event on chr12 that cannot be confidently called from unphased BAF and LRR data (g, i) but is evident in phased BAF data (h). Several phase switch errors cause sign flips in phased BAF across chr12. The slight positive shift in mean LRR (i) indicates that this event is most likely to be a mosaic gain of chr12. In h, mean phased BAF is plotted for SNPs aggregated into bins spanning n = 50 heterozygous sites; the same bins are used for i. Error bars, s.e.m.

Extended Data Fig. 2 Estimation of true FDR using age distributions of individuals with mCA calls.

We generated age distributions for (i) ‘high-confidence’ detected events passing a permutation-based FDR threshold of 0.01 (bright red); (ii) ‘medium-confidence’ events below the FDR threshold of 0.01 but passing an FDR threshold of 0.05 (darker red); and (iii) ‘low-confidence’ events below the FDR threshold of 0.05 but passing an FDR threshold of 0.10 (darkest red; not analysed but plotted for context). We compared these distributions to the overall age distribution of UK Biobank participants (grey). On the basis of the numbers of events in each category, approximately 20% of medium-confidence detected events are expected to be false positives. To estimate our true FDR, we regressed the medium-confidence age distribution on the high-confidence and overall age distributions, reasoning that the medium-confidence age distribution should be a mixture of correctly called events with age distribution similar to that of the high-confidence events, and spurious calls with age distribution similar to the overall cohort. We observed a regression weight of 0.31 for the component corresponding to spurious calls, in good agreement with expectation, and implying a true FDR of 7.5% (6.2–8.8%, 95% CI based on regression fit on n = 6 age bins).

Extended Data Fig. 3 Clonal cell fractions of co-occurring events generally suggest co-existence within the same cell population.

For each pair of significantly co-occurring events (Fig. 2b), we compared the clonal fractions of the two events within each individual that carried both events. Each point in the plots corresponds to an individual carrying the pair of events under consideration; individuals are colour-coded by the total number of events they carry. For nearly all pairs of events, the clonal fractions of the two events were very similar in most individuals carrying both events, suggesting that the events occurred in the same clonal cell population. A few exceptions do seem to exist; for example, 22q– versus 13q CNN-LOH cell fraction; here, the cell fractions suggest that 13q CNN-LOH events may be present in a subclone. This observation is consistent with acquired uniparental disomy of 13q providing a second hit within a del(13q14) clonal expansion, as we see in Extended Data Fig. 8. (We did not include del(13q14) vs. 13q CNN-LOH in this plot because inference of clonal fractions is complex for these overlapping events; see Extended Data Fig. 8.)

Extended Data Fig. 4 Replication of previous association between JAK2 46/1 haplotype and 9p CNN-LOH in cis due to clonal selection.

The common JAK2 46/1 haplotype has previously been shown to confer risk of somatic JAK2 V617F mutation such that subsequent 9p CNN-LOH produces a strong proliferative advantage15,16,17,18,20 (right). In our analysis, CNN-LOH on 9p is strongly associated with JAK2 46/1 (P = 1.6 × 10−13, OR = 2.7 (2.1–3.5); Fisher’s exact test on n = 120,664 individuals) with the risk haplotype predominantly duplicated by CNN-LOH in hets (52 of n = 61 heterozygous cases; binomial P = 1.8 × 10−8). Left, the genomic modification is illustrated in the top panel and association signals are plotted in the bottom. The lead associated variant is labelled, and variants are coloured according to linkage disequilibrium with the lead variant (scaled for readability).

Extended Data Fig. 5 Evidence of multiple causal variants for 10q25.2 breakage and 1p CNN-LOH associations.

a, Multiple expanded repeats at FRA10B drive breakage at 10q25.2. We identified 12 distinct primary repeat motifs at FRA10B in 26 whole-genome-sequenced individuals from 14 families (labelled VNTR-N-x, where N denotes length in base pairs); carriers of these repeats exhibit varying degrees of FRA10B repeat expansion (Supplementary Note 8). The repeat motifs are AT-rich and are similar to FRA10B repeats previously reported35. The alignment provided here includes the repeat motifs that were most frequently observed in FRA10B expanded alleles35 (E8, E13, E17, and E19) along with a few other closely related expanded repeat motifs (E10, E11, and E12). b, Carriers of the 10q terminal deletion in the UK Biobank share long haplotypes at 10q25.2 identical-by-descent. Square nodes in the IBD graph correspond to males and circles to females. Node size is proportional to cell fraction and edge weight increases with IBD length. Coloured nodes indicate imputed carriers of variable number tandem repeats (VNTRs) at FRA10B (Supplementary Table 7); colour intensity scales with imputed dosage. c, Identity-by-descent graph at MPL locus (chr1:43.8 Mb) on individuals with mCAs on chr1 extending to the p telomere. Colored nodes indicate imputed carriers of SNPs independently associated with mosaic 1p CNN-LOH (Fig. 4a).

Extended Data Fig. 6 Germline CNVs at 15q26.3.

a, Read depth profile plot of WGS samples in the terminal 700 kb of chr15q. Three individuals in one family carry an approximately 70-kb deletion at 15q26.3, and a fourth carries the same deletion along with an approximately 290-kb duplication (probably on the same haplotype, based on population frequencies of these events; see Extended Data Fig. 7). These four individuals (highlighted in blue) segregate with the rs182643535:T allele in the WGS cohort. Inset: the parental carrier in the family, individual 10921, has detectable mosaicism in two distinct 15q CNN-LOH subclones (one starting at 41.64 Mb with 4.6% cell fraction, the other starting at 71.64 Mb with an additional 2.0% cell fraction). b, Expanded read depth profile plot, with deletion-only individuals highlighted in blue and the del + dup individual highlighted in green. Breakpoint analysis indicates that the deletion spans chr15:102151467–102222161 and contains a 1,139-bp mid-segment (chr15:102164897–102166035) that is retained in inverted orientation. The duplication spans chr15:102026997–102314016.

Extended Data Fig. 7 Mosaic chromosomal alterations and germline CNVs at 15q26.3.

Using identified breakpoints of the germline 70-kb deletion and 290-kb duplication (Extended Data Fig. 6), we computed mean genotyping intensity (LRR) in UK Biobank samples within the 70-kb deletion region (24 probes) and within the flanking 220-kb region (97 probes). Individuals are plotted by flanking 220-kb mean LRR versus 70-kb mean LRR and coloured according to mosaic status for somatic 15q mCAs. UK Biobank samples carrying the 70-kb deletion, 290-kb duplication, and both (del+dup) are all easily identifiable in distinct clusters. The plot also appears to contain clusters with higher copy number. Of the three CNV-carrying alleles, the simple 70-kb deletion is the only one that predisposes to mCAs. Most mosaic events containing the 70-kb deletion are CNN-LOH events that make cells homozygous for the 70-kb deletion; two individuals have somatic loss of the homologous (normal) chromosome, making cells hemizygous for the 70-kb deletion.

Extended Data Fig. 8 Phased BAF plots of chromosomes with multiple CNN-LOH subclones.

All of the plots exhibit step functions of increasing |ΔBAF| towards a telomere, which is the hallmark of multiple clonal cell populations containing distinct CNN-LOH events that affect different spans of a chromosomal arm (all extending to the telomere). Distinct |ΔBAF| values (called using an HMM) are indicated with different colours. Flips in the sign of phased BAF usually correspond to phase switch errors. Two samples exhibit high switch error rates: 14q individual 3067 (explained by non-European ancestry), and 1p individual 23 (explained by very high |ΔBAF|; extreme shifts in genotyping intensities result in poor genotyping quality). All five individuals with multiple CNN-LOH events on chr13q appear to contain switch errors over 13q14, but these switches are actually explained by overlapping 13q14 deletions; see Supplementary Note 1 for detailed discussion.

Extended Data Fig. 9 CLL prediction accuracy: receiver operating curves and precision-recall curves.

CLL prediction benchmarks using tenfold stratified cross validation on: only individuals with lymphocyte counts in the normal range (1 × 109/L to 3.5 × 109/L), as in our primary analyses (n = 36 cases, 113,923 controls) (a, b); and individuals with any lymphocyte count (n = 78 cases, 118,481 controls) (c, d). a matches Fig. 5b, and b shows the precision-recall curve from the same analysis. c and d correspond to an analogous analysis in which we removed the restriction on lymphocyte count and also used additional mosaic event variables for prediction (11q–, 14q–, 22q–, and total number of autosomal events). In both benchmarks, individuals with previous cancer diagnoses or CLL diagnoses within 1 year of assessment were excluded; however, some individuals with very high lymphocyte counts pass this filter (and probably already had CLL at assessment despite being undiagnosed for more than 1 year), hence the difference in apparent prediction accuracy between the two benchmarks.

Extended Data Fig. 10 Mosaic chromosomal alterations detected in CLL cases sorted by lymphocyte count.

Individuals are stratified by cancer status at DNA collection (no previous diagnosis versus any previous diagnosis), and mCAs (red, loss; green, CNN-LOH; blue, gain; grey, undetermined) are plotted per chromosome as coloured rectangles (with height increasing with BAF deviation).

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-9, Supplementary References and Supplementary Tables 1-16.

Reporting Summary

Supplementary Data

This Excel file contains spreadsheets with individual-level mosaic event calls from this manuscript and previous studies of clonal hematopoiesis.

Supplementary Data

This zip archive contains BED-format UCSC Genome Browser tracks for event calls from this manuscript and previous studies of clonal hematopoiesis. The archive also contains a readme document describing how the files were generated.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Loh, PR., Genovese, G., Handsaker, R.E. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018). https://doi.org/10.1038/s41586-018-0321-x

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing