Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown1. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer2,3,4 and coronary heart disease5—this phenomenon is termed clonal haematopoiesis of indeterminate potential (CHIP)6. Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIP driver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Individual WGS data for TOPMed whole genomes, individual-level harmonized phenotypes, harmonized germline variant call sets, the CHIP somatic variant call sets, RNA-seq and peripheral blood methylation data used in this analysis are available through restricted access via the dbGaP. Accession numbers for these datasets are provided in Supplementary Table 1. Summary-level genotype data are available through the BRAVO browser (https://bravo.sph.umich.edu/). Full GWAS summary statistics are available for general research use through controlled access at dbGaP accession phs001974: NHLBI TOPMed: Genomic Summary Results for the Trans-Omics for Precision Medicine programme. A subset of the TOPMed cohorts analysed here is based on sensitive populations, precluding public sharing of full genomic summary results.
Kennedy, B. K. et al. Geroscience: linking aging to chronic disease. Cell 159, 709–713 (2014).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).
Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377, 111–121 (2017).
Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Preprint at https://doi.org/10.1101/563866 (2019).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Loh, P. R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Patel, K. V. et al. Red cell distribution width and mortality in older adults: a meta-analysis. J. Gerontol. A 65, 258–365 (2010).
Bick, A. G. et al. Genetic interleukin 6 signaling deficiency attenuates cardiovascular risk in clonal hematopoiesis. Circulation 141, 124–131 (2020).
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).
Bowman, R. L., Busque, L. & Levine, R. L. et al. Clonal hematopoiesis and evolution to hematopoietic malignancies. Cell Stem Cell 22, 157–170 (2018).
Desai, P. et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 24, 1015–1023 (2018).
Bojesen, S. E. et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat. Genet. 45, 371–384 (2013).
Bao, E. L., et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature https://doi.org/10.1038/s41586-020-2786-7 (2020).
Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).
Hinds, D. A. et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood 128, 1121–1128 (2016).
Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).
Smith, B. W. et al. The aryl hydrocarbon receptor directs hematopoietic progenitor cell expansion and differentiation. Blood 122, 376–385 (2013).
Cybulski, C. et al. CHEK2 is a multiorgan cancer susceptibility gene. Am. J. Hum. Genet. 75, 1131–1135 (2004).
Rudd, M. F., Sellick, G. S., Webb, E. L., Catovsky, D. & Houlston, R. S. Variants in the ATM–BRCA2–CHEK2 axis predispose to chronic lymphocytic leukemia. Blood 108, 638–644 (2006).
Huynh, M. et al. Hyaluronan and proteoglycan link protein 1 (HAPLN1) activates bortezomib-resistant NF-κB activity and increases drug resistance in multiple myeloma. J. Biol. Chem. 293, 2452–2465 (2018).
Moran-Crusio, K. et al. Tet2 loss leads to increased hematopoietic stem cell self-renewal and myeloid transformation. Cancer Cell 20, 11–24 (2011).
Kilpivaara, O. et al. A germline JAK2 SNP is associated with predisposition to the development of JAK2 V617F-positive myeloproliferative neoplasms. Nat. Genet. 41, 455–459 (2009).
Jones, A. V. et al. JAK2 haplotype is a major risk factor for the development of myeloproliferative neoplasms. Nat. Genet. 41, 446–449 (2009).
Olcaydu, D. et al. A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nat. Genet. 41, 450–454 (2009).
Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).
Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9, 4038 (2018).
Jun, G., Wing, M. K., Abecasis, G. R. & Kang, H. M. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 25, 918–925 (2015).
Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. Nature 581, 434–443 (2020).
Gibson, C. J. et al. Clonal hematopoiesis associated with adverse outcomes after autologous stem-cell transplantation for lymphoma. J. Clin. Oncol. 35, 1598–1605 (2017).
Hiatt, J. B., Pritchard, C. C., Salipante, S. J., O’Roak, B. J. & Shendure, J. Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Res. 23, 843–854 (2013).
Pérez Millán, M. I. et al. Next generation sequencing panel based on single molecule molecular inversion probes for detecting genetic variants in children with hypopituitarism. Mol. Genet. Genomic Med. 6, 514–525 (2018).
Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Vattathil, S. & Scheet, P. Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome Res. 23, 152–158 (2013).
Fowler, J., San Lucas, F. A. & Scheet, P. System for quality-assured data analysis: flexible, reproducible scientific workflows. Genet. Epidemiol. 43, 227–237 (2019).
Natarajan, P. et al. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nat. Commun. 9, 3391 (2018).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. http://doi.org/10.18637/jss.v067.i01 (2015).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Nasser, J. et al. Genome-wide maps of enhancer regulation connect risk variants to disease genes. Preprint at https://doi.org/10.1101/2020.09.01.278093 (2020).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
eGTEx Project. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat. Genet. 49, 1664–1670 (2017).
Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012).
Horvath, S. & Levine, A. J. HIV-1 infection accelerates age according to the epigenetic clock. J. Infect. Dis. 212, 1563–1573 (2015).
Barfield, R. T., Kilaru, V., Smith, A. K. & Conneely, K. N. CpGassoc: an R function for analysis of DNA methylation microarray data. Bioinformatics 28, 1280–1281 (2012).
Investigators who conducted this research report individual research support from R35 HL135818 (S.R.), P01 HL132825 (S.T.W.), R01 HL091357 and R01 HL055673 (D.K.A.), W81 XWH-17-1-0597 (D.A.S.), K01 HL135405 (B.E.C.), P01 HL132825 (J.L.-S.), K01HL136700 (S.A.), R01 HL113323 (J.E.C.), R01HL1333040 (D.E.W.), R01 HL138737 (D.D.), P01 HL132825 (P.K.), T32 HL129982 (L.M.R.), R01 HL113323 (J.B.), HHS-N268201800002I (T.W.B. and A.V.S.), U54 GM115428 (J.G.W.), R01 HL148565 and R01 HL148050 (P.N.), F30 HL149180 (S.M.Z.), R01 HL139731 and AHA-18SFRN34250007 (S.L.), DP5 OD029586 (A.G.B.), Claudia Adams Barr Program for Innovative Cancer Research (V.G.S.), R01 142711, MGH Hassenfeld Scholar Award (P.N.), Fondation Leducq TNE-18CVD04 (A.G.B., B.L.E., S.J., P.N. and S.K.), Burroughs Wellcome Foundation (A.G.B. and S.J.), Ludwig Cancer Center (S.J.) and UM1-HG008895 (S.K.). WGS for the Trans-Omics in Precision Medicine (TOPMed) programme was supported by the National Heart, Lung, and Blood Institute (NHLBI). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute, the National Institutes of Health or the U.S. Department of Health and Human Services.
B.M.P. serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. E.K.S. and M.H.C. received grant support from GlaxoSmithKlein and Bayer. S.T.W. received royalties from UpToDate. S.A. reports employment and equity in 23andMe. B.I.F. is a consultant for RenalytixAI and AstraZeneca Pharmaceuticals. M.E.M. reports funding from Regeneron Pharmaceuticals, unrelated to this project. M.H.C. has received grant support from GlaxoSmithKlein and Bayer and consulting or speaking fees from AstraZeneca and Illumina. J.S.F. has consulted for Shionogi. B.D.L. is a co-founder of Nocion Therapeutics; receives grant support from Pieris Pharmaceuticals, Sanofi and Samsung Research America; and has served as a consultant for Bayer, Entrinsic Health, Gossamer Bio, NControl, Novartis, Teva and Thetis Pharmaceuticals. E.S.L. serves on the board of directors for Codiak BioSciences and serves on the scientific advisory board of F-Prime Capital Partners and Third Rock Ventures. B.L.E. reports grant support from Celgene and Deerfield. P.T.E. has received grant support from Bayer AG and has served on advisory boards or consulted for Bayer AG, Quest Diagnostics, MyoKardia and Novartis. G.A. is an employee of Regeneron Pharmaceuticals and owns stock and stock options for Regeneron Pharmaceuticals. S.J. is a scientific advisor to Grail. S.L. receives sponsored research support from Bristol Myers Squibb, Pfizer, Bayer AG, Boehringer Ingelheim and Fitbit; has consulted for Bristol Myers Squibb, Pfizer and Bayer AG; and participates in a research collaboration with IBM. P.N. reports grants support from Amgen, Apple and Boston Scientific, and is a scientific advisor to Apple. S.K. is an employee of Verve Therapeutics, and holds equity in Verve Therapeutics, Maze Therapeutics, Catabasis and San Therapeutics; is a member of the scientific advisory boards for Regeneron Genetics Center and Corvidia Therapeutics; and has served as a consultant for Acceleron, Eli Lilly, Novartis, Merck, Novo Nordisk, Novo Ventures, Ionis, Alnylam, Aegerion, Haug Partners, Noble Insights, Leerink Partners, Bayer Healthcare, Illumina, Color Genomics, MedGenome, Quest and Medscape. The other authors declare no competing interests.
Peer review information Nature thanks Stephen Chanock, Ross Levine and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, There was marked heterogeneity of CHIP clone size as measured by variant allele fraction by CHIP driver gene. Violin plot spanning minimum and maximum values calculated on full data set (Supplementary Table 3). Sample size for each element in violin plot displayed in Fig. 1. b, 90% of individuals with CHIP had only one somatic CHIP driver mutation variant identified. c, CHIP prevalence with age was highly concordant across sequenced cohorts. CHIP prevalence was estimated from a logistic mixed model with spline-transformed age, sex, and cohort included as predictors. The cohort was included as a random intercept. Sample size for each cohort listed in Supplementary Table 1. d, CHIP prevalence with age in this study (blue triangles, n = 82,807) was highly consistent with previously observed CHIP prevalence (dots represent mean point prevalence with shaded area represents 95% confidence interval; nGenovese = 12,380; nJaiswal = 17,182; nXie = 2,728).
Extended Data Fig. 2 CHIP age association by mutational mechanism, gene and overlap with somatic chromosomal mosaicism.
a, Cumulative density plot of CHIP incidence with age stratified by single nucleotide variant (SNV) vs frameshift mutations. SNVs were observed in younger individuals than Frameshift mutations (n = 4,939; two-sided Wilcoxon rank sum test P = 0.01). b, Cumulative density plot of CHIP incidence with age stratified by driver gene. c, 855 elderly WHI individuals (mean age: 70) with both whole genome and the array genotyping data available were interrogated for large-scale somatic mosaic chromosomal rearrangements. The two somatic events did not co-occur more than would be expected by chance (hypergeometric P = 0.25).
a, CHIP consistently associated with increased RDW. JAK2, SF3B1 and SRSF2 showed driver gene specific effects on blood traits (see Supplementary Table 5). b, CHIP status was not consistently associated with lipid traits, other than JAK2 CHIP which was associated with decreased total cholesterol and a trend towards decreased LDL (see Supplementary Table 6). c, CHIP status is associated with inflammatory markers, however notable heterogeneity existed across CHIP mutations (see Supplementary Table 7). Associations used a two-sided t-test from a multivariate general linear model including age, smoking, race and gender and study centre and were not adjusted for multiple comparisons. Sample sizes and exact p-values for each phenotype are listed in Supplementary Tables 5–7.
a, Singleton mutation counts by nucleotide context in CHIP cases and controls. b, Signature contribution in CHIP cases and controls identified differential enrichment.
a, TERT locus. b, TRIM59–KPNA4 locus. c, TET2 locus. Two-sided association testing performed using SAIGE (n = 65,405 individuals, see Methods).
Extended Data Fig. 6 CHIP transcriptome-wide association study (TWAS) results across 48 tissues identified 7 significant loci.
UTMOST algorithm applied to CHIP genome wide association study results from n = 65,405 individuals (see Methods). Genomic coordinates listed on x-axis. P value from generalized Berk-Jones test on y-axis. Multiple hypothesis corrected threshold, P < 2.9 × 10−6 displayed as dotted red line.
UTMOST algorithm applied to CHIP genome wide association study results from n = 65,405 individuals. P value from generalized Berk-Jones test. eQTL z-scores for associations with P < 0.05 are displayed in each bar. GTEX eQTL tissue listed on y-axis.
Extended Data Fig. 8 CRISPR–Cas9 editing efficiency of TET2 enhancer deletion in primary CD34+ HSPCs.
a, Schematic showing the position of the two sgRNAs used to delete the TET2 enhancer (512 bp) containing rs79901204. b, Gel electrophoresis image of PCR products from genomic DNA of edited HSPCs indicating unedited (WT) and deletion bands at sgRNA target site. Percentages of deletion alleles determined by band intensity and is shown below each lane. The experiment contains 3 biological replicates and was performed once.
Methylation quantitative trait association results of rs79901204 variant with CpG methylation probes identify an altered peripheral leukocyte methylation profile genome wide in n = 1,747 individuals. The strongest signal is at the chr4 TET2 locus. P values on y-axis derived from two-sided linear mixed effects model (see Methods). To account for multiple hypothesis testing, a Bonferroni threshold of P < 5.8 × 10−8 was used to establish statistical significance.
A set of 30 samples from a previously published CHIP cohort33 were computationally down sampled to 30x, 40x, 50x, 100x and 400x sequencing depth. TOPMed WGS data were typically in the 40x depth range across CHIP genes. WGS data have excellent sensitivity to detect CHIP clones with VAF > 10%, and ~50% sensitivity to detect CHIP VAF 5–10%, with minimal ability to detect CHIP clones <5%.
About this article
Cite this article
Bick, A.G., Weinstock, J.S., Nandakumar, S.K. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature (2020). https://doi.org/10.1038/s41586-020-2819-2