Abstract
Mosaic loss of the X chromosome (mLOX) is the most common clonal somatic alteration in leukocytes of female individuals1,2, but little is known about its genetic determinants or phenotypic consequences. Here, to address this, we used data from 883,574 female participants across 8 biobanks; 12% of participants exhibited detectable mLOX in approximately 2% of leukocytes. Female participants with mLOX had an increased risk of myeloid and lymphoid leukaemias. Genetic analyses identified 56 common variants associated with mLOX, implicating genes with roles in chromosomal missegregation, cancer predisposition and autoimmune diseases. Exome-sequence analyses identified rare missense variants in FBXO10 that confer a twofold increased risk of mLOX. Only a small fraction of associations was shared with mosaic Y chromosome loss, suggesting that distinct biological processes drive formation and clonal expansion of sex chromosome missegregation. Allelic shift analyses identified X chromosome alleles that are preferentially retained in mLOX, demonstrating variation at many loci under cellular selection. A polygenic score including 44 allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Our results support a model in which germline variants predispose female individuals to acquiring mLOX, with the allelic content of the X chromosome possibly shaping the magnitude of clonal expansion.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Overall and population-level GWAS summary statistics generated from the mLOX meta-analysis are available on the GWAS catalogue (accession numbers GCST90328147, GCST90328148, GCST90328149 and GCST90328150). Requests for access to individual-level data differ for each contributing biobank. For FinnGen, researchers can apply for health data from the Finnish Data Authority Findata (https://findata.fi/en/permits/) and individual-level genotype data available through the Fingenious portal (https://site.fingenious.fi/en/). These resources are hosted by the Finnish Biobank Cooperative FINBB (https://finbb.fi/en/). Access can only be provided for research projects within the scope of the Finnish Biobank Act, which includes health promotion, understanding disease mechanisms or developing medical products or treatment practices. For EBB, individual-level health, lifestyle, demographic and genetic data are anonymized and available for research projects. Data sharing is conducted in accordance with the regulations of the Estonian Genome Center of the University of Tartu (HGRA). A data application form can be found at https://www.biobank.ee. The research project has to obtain approval from the Ethics Review Committee on Human Research of the University of Tartu as well as approval from the EGCUT scientific committee. For UKBB, all individual-level data used in the analysis is available by application to the UKBB Access Management System (https://www.ukbiobank.ac.uk). Approved researchers can submit applications for review and assessments are made to determine if the research proposal qualifies as health-related research in line with public interest. For BCAC, data for some of the samples are available on dbGAP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001265.v1.p1). Requests for BCAC data can be made to the Data Access Coordination Committee (DACC) of BCAC (http://bcac.ccge.medschl.cam.ac.uk/bcacdata/). BCAC DACC approval is required to access individual-level phenotype and genotype data from the ABCFS, ABCS, ABCTB, BBCC, BBCS, BCEES, BCFR-NY, BCFR-PA, BCFR-UTAH, BCINIS, BIGGS, BREOGAN, BSUCH, CBCS, CCGP, CECILE, CGPS, CNIO-BCS, CPSII, CTS, EPIC, DIETCOMPLYF, ESTHER, GC-HBOC, GENICA, HABCS, HCSC, HEBCS, HMBCS, HUBCS, KARBAC, KARMA, KBCP, KCONFAB/AOCS, LMBC, MABCS, MARIE, MBCSG, MCBCS, MCCS, MEC, MISS, MMHS, MTLGEBCS, NBCS, NC-BCFR, NBHS, NCBCS, NHS, NHS2, OBCS, OFBCR, ORIGO, PBCS, PKARMA, PLCO, POSH, RBCS, SASBAC, SBCS, SEARCH, SISTER, SKKDKFZS, SMC, SZBCS, UCIBCS, UKBGS, UKOPS and USRT studies. For MVP, summary statistics are available on dbGaP under the MVP accession number phs001672. Additional data supporting the findings of this study are available upon reasonable request from MVP. These data are not publicly available due to restrictions of the US Government and Department of Veterans Affairs concerning privacy and participant consent. For MGB, a portion of individual-level genomic data are available in dbGAP as part of the eMERGE consortium (phs001584.v2.p2) and as part of the Center Common Disease Genomics (phs002018.v1.p1). Additional MGB data are not currently publicly available due to data restrictions. For PLCO, individual-level genotype data is available in dbGaP (phs001286.v2.p2). Permitted data use includes discovery and hypothesis generation in the investigation of genetic contributions to cancer risk and risk of other diseases as well as development of novel analytical approaches for GWAS. Individual-level phenotype data can be requested through the NCI Cancer Data Access System (CDAS) (https://cdas.cancer.gov/plco/). For BBJ, information on the cohort is available at the RIKEN website (http://jenger.riken.jp/en/). While individual-level genetic data are not accessible, all other individual-level data are available upon request.
Code availability
The MoChA pipelines used for mLOX calling (mocha.wdl), GWAS (assoc.wdl), allelic shift analysis (impute.wdl and shift.wdl) and X chromosome differential score estimation (score.wdl) are available at https://doi.org/10.5281/zenodo.1089252086 (please see the detailed and most updated version at https://github.com/freeseek/mochawdl). The GWAS meta-analysis was performed by using the pipeline developed by the COVID-19 Host Genetics Initiative, available at https://github.com/covid19-hg/META_ANALYSIS. The codes used for the Bayesian line model are available at https://github.com/dsgelab/Mosaic-loss-of-chromosome-X/tree/main/BayesLineModel.
References
Machiela, M. J. et al. Female chromosome X mosaicism is age-related and preferentially affects the inactivated X chromosome. Nat. Commun. 7, 11843 (2016).
Zekavat, S. M. et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat. Med. 27, 1012–1024 (2021).
Brown, C. J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991).
Lyon, M. F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).
Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244–248 (2017).
Busque, L. et al. Nonrandom X-inactivation patterns in normal females: lyonization ratios vary with age. Blood 88, 59–65 (1996).
Gale, R. E. & Linch, D. C. Interpretation of X-chromosome inactivation patterns. Blood 84, 2376–2378 (1994).
Zito, A. et al. Heritability of skewed X-inactivation in female twins is tissue-specific and associated with age. Nat. Commun. 10, 5339 (2019).
Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).
Dumanski, J. P. et al. Smoking is associated with mosaic loss of chromosome Y. Science 347, 81–83 (2015).
Zhou, W. et al. Mosaic loss of chromosome Y is associated with common variation near TCL1A. Nat. Genet. 48, 563–568 (2016).
Wright, D. J. et al. Genetic variants associated with mosaic Y chromosome loss highlight cell cycle genes and overlap with cancer susceptibility. Nat. Genet. 49, 674–679 (2017).
Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).
Loh, P. R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Lin, S. H. et al. Incident disease associations with mosaic chromosomal alterations on autosomes, X and Y chromosomes: insights from a phenome-wide association study in the UK Biobank. Cell Biosci. 11, 1–11 (2021).
Zhou, W. et al. Detectable chromosome X mosaicism in males is rarely tolerated in peripheral leukocytes. Sci. Rep. 11, 1193 (2021).
Sybert, V. P. & McCauley, E. Turner’s syndrome. N. Engl. J. Med. 351, 1227–1238 (2004).
Jäger, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013).
Koren, A. & McCarroll, S. A. Random replication of the inactive X chromosome. Genome Res. 24, 64–69 (2014).
Kessler, M. D. et al. Common and rare variant associations with clonal haematopoiesis phenotypes. Nature 612, 301–309 (2022).
Terao, C. et al. GWAS of mosaic loss of chromosome Y highlights genetic effects on blood cell differentiation. Nat. Commun. 10, 4719 (2019).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Leitsalu, L. et al. Cohort profile: Estonian biobank of the Estonian genome center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
Hunter-Zinck, H. et al. Genotyping array design and data quality control in the Million Veteran Program. Am. J. Hum. Genet. 106, 535–548 (2020).
Karlson, E. W., Boutin, N. T., Hoffnagle, A. G. & Allen, N. L. Building the partners healthcare biobank at partners personalized medicine: informed consent, return of research results, recruitment lessons and operational considerations. J. Pers. Med. 6, 2 (2016).
Boutin, N. T. et al. The evolution of a large biobank at Mass General Brigham. J. Pers. Med. 12, 1323 (2022).
Machiela, M. et al. GWAS Explorer: an open-source tool to explore, visualize, and access GWAS summary statistics in the PLCO Atlas. Sci. Data 10, 25 (2023).
Nagai, A. et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Vlasschaert, C. et al. A practical approach to curate clonal hematopoiesis of indeterminate potential in human genetic datasets. Blood 141, 2214–2223 (2023).
Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231 (2020).
Frampton, M. et al. Variation at 3p24. 1 and 6q23. 3 influences the risk of Hodgkin’s lymphoma. Nat. Commun. 4, 2549 (2013).
Berndt, S. I. et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat. Commun. 7, 10933 (2016).
Celik, H. et al. JARID2 functions as a tumor suppressor in myeloid neoplasms by repressing self-renewal in hematopoietic progenitor cells. Cancer Cell 34, 741–756 (2018).
Pattabiraman, D. R. & Gonda, T. J. Role and potential for therapeutic targeting of MYB in leukemia. Leukemia 27, 269–277 (2013).
Schaffner, C., Stilgenbauer, S., Rappold, G. A., Döhner, H. & Lichter, P. Somatic ATM mutations indicate a pathogenic role of ATM in B-cell chronic lymphocytic leukemia. Blood 94, 748–753 (1999).
Zenz, T. et al. TP53 mutation and survival in chronic lymphocytic leukemia. J. Clin. Oncol. 28, 4473–4479 (2010).
Catalano, A. et al. The PRKAR1A gene is fused to RARA in a new variant acute promyelocytic leukemia. Blood 110, 4073–4076 (2007).
Loh, P. R., Genovese, G. & McCarroll, S. A. Monogenic and polygenic inheritance become instruments for clonal selection. Nature 584, 136–141 (2020).
Luo, Y. et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat. Genet. 53, 1504–1516 (2021).
Ritari, J., Koskela, S., Hyvärinen, K. & Partanen, J. HLA-disease association and pleiotropy landscape in over 235,000 Finns. Hum. Immunol. 83, 391–398 (2022).
Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).
Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 52, 969–983 (2020).
Zhou, W. et al. SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat. Genet. 54, 1466–1469 (2022).
Chiorazzi, M. et al. Related F-box proteins control cell death in Caenorhabditis elegans and human lymphoma. Proc. Natl Acad. Sci. USA 110, 3943–3948 (2013).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52, 506 (1993).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Yang, C. H., Tomkiel, J., Saitoh, H., Johnson, D. H. & Earnshaw, W. C. Identification of overlapping DNA-binding and centromere-targeting domains in the human kinetochore protein CENP-C. Mol. Cell. Biol. 16, 3576–3586 (1996).
Du, Y., Topp, C. N. & Dawe, R. K. DNA binding of centromere protein C (CENPC) is stabilized by single-stranded RNA. PLoS Genet. 6, e1000835 (2010).
Delaneau, O., Zagury, J. F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
Zhao, Y. et al. Detection and characterization of male sex chromosome abnormalities in the UK Biobank study. Genet. Med. 24, 1909–1919 (2022).
Zhao, Y. et al. GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health. Nat. Commun. 12, 4178 (2021).
Balduzzi, S., Rücker, G. & Schwarzer, G. How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health 22, 153–160 (2019).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015A).
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
GTEx Consortium. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Võsa, U. et al. Large-scale cis-and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Qi, T. et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 9, 2282. (2018).
Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021).
Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet. 55, 1267–1276 (2023).
Gardner, E. J. et al. Damaging missense variants in IGF1R implicate a role for IGF-1 resistance in the aetiology of type 2 diabetes. Cell Genomics 2, 100208 (2022).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Zhang, H. et al. A powerful procedure for pathway-based meta-analysis using summary statistics identifies 43 pathways associated with type II diabetes in European populations. PLoS Genet. 12, e1006122 (2016).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68 (2015).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Ritari, J. et al. Increasing accuracy of HLA imputation by a population-specific reference panel in a FinnGen biobank cohort. NAR Genomics Bioinformatics 2, lqaa030 (2020).
Genovese, G. MoChA WDL pipelines 2022-12-21. Zenodo https://doi.org/10.5281/zenodo.10892520 (2022).
Acknowledgements
The authors thank J. Karjalainen and M. Cordioli for assistance in GWAS meta-analysis; S. J. Andrews and J. Leinonen for sharing formatted GWAS summary statistics used in genetic correlation analyses; S. Jukarainen and A. Gerussi for insightful discussion on phenome-wide association studies (pheWAS) analyses from a clinical standpoint; S. Jones and M. Kanai for valuable feedback on HLA and fine-mapping; J. Koskela and M. Myllymäki for discussion on clonal haematopoiesis; Y. Fu and A. Preussner for discussion on genetic analyses of sex chromosomes; G. Kops for discussion on the mechanism causing chromosome missegregation; A. Kouno and the members of the BBJ Project for supporting the BBJ analyses; and B. Wheeler for his assistance in running the pathway analyses. We acknowledge the participants and investigators of each contributing biobank. The FinnGen project is funded by two grants from Business Finland (HUS 4685/31/2016 and UH 4386/31/2016) and the following industry partners: AbbVie, AstraZeneca UK, Biogen MA, Bristol Myers Squibb (and Celgene Corporation & Celgene International II), Genentech, Merck Sharp & Dohme, Pfizer, GlaxoSmithKline Intellectual Property Development, Sanofi US Services, Maze Therapeutics, Janssen Biotech, Novartis and Boehringer Ingelheim International. The following biobanks are acknowledged for delivering biobank samples to FinnGen: Auria Biobank (www.auria.fi/biopankki), THL Biobank (www.thl.fi/biobank), Helsinki Biobank (www.helsinginbiopankki.fi), Biobank Borealis of Northern Finland (https://www.ppshp.fi/Tutkimus-ja-opetus/Biopankki/Pages/Biobank-Borealis-briefly-in-English.aspx), Finnish Clinical Biobank Tampere (www.tays.fi/en-US/Research_and_development/Finnish_Clinical_Biobank_Tampere), Biobank of Eastern Finland (www.ita-suomenbiopankki.fi/en), Central Finland Biobank (www.ksshp.fi/fi-FI/Potilaalle/Biopankki), Finnish Red Cross Blood Service Biobank (www.veripalvelu.fi/verenluovutus/biopankkitoiminta), Terveystalo Biobank (www.terveystalo.com/fi/Yritystietoa/Terveystalo-Biopankki/Biopankki/) and Arctic Biobank (https://www.oulu.fi/en/university/faculties-and-units/faculty-medicine/northern-finland-birth-cohorts-and-arctic-biobank). All Finnish Biobanks are members of the BBMRI.fi infrastructure (www.bbmri.fi). Finnish Biobank Cooperative (FINBB) (https://finbb.fi/) is the coordinator of BBMRI-ERIC operations in Finland. The Finnish biobank data can be accessed through Fingenious services (https://site.fingenious.fi/en/) managed by FINBB. The work related to EBB was supported by the Estonian Research Council grants PRG1911 and TK (TK214) and the European Union through the European Regional Development Fund Project no. 2014-2020.4.01.15-0012 GENTRANSMED. The EBB data analysis was carried out in part in the High-Performance Computing Center of University of Tartu. For BCAC and MVP, a detailed acknowledgement is available in the Supplementary Information. This work was supported by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, and the Medical Research Council (unit programmes: MC_UU_12015/2, MC_UU_00006/2). G.G. was supported by NIH grants R01 MH104964 and R01 MH123451; A. Ganna was supported by the Academy of Finland (grant no. 323116) and by the European Research Council under the European Union’s Horizon 2020 Research and Innovation Programme (grant no. 945733); P.-R.L. was supported by NIH grant DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, the Next Generation Fund at the Broad Institute of MIT and Harvard, and a Sloan Research Fellowship; J.R.B.P. receives research fundings from GSK; C.T. was supported by Japan Agency for Medical Research and Development (AMED) grants JP21ek0109555, JP21tm0424220, JP21ck0106642, JP22wm0425008, JP23ek0410114 and JP23tm0424225, and Japan Society for the Promotion of Science (JSPS) KAKENHI grant JP20H00462; P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech/Roche and Novartis; and S.P. acknowledges research funding from the mvp000 grant. The manuscript does not necessarily represent the views of the Department of Veterans Affairs or the US Government.
Author information
Authors and Affiliations
Consortia
Contributions
This project was initiated and led by A.L., G.G., P.-R.L., A. Ganna, J.R.B.P. and M.J.M. A.L. and M.J.M. wrote the first draft of the manuscript with input from all lead authors. A.L. coordinated the analyses of each contributing biobank, conducted across-biobank meta-analysis (including GWAS, allelic shift analysis and pheWAS) and FinnGen-specific analyses, organized post-GWAS analyses, designed and generated all figures and tables (except where noted), and wrote Results, Methods and part of the introduction and Discussion sections. G.G. developed the MoChA pipelines for mLOX calling, GWAS, allelic shift analysis, and X chromosome differential score estimation, guided the analyses of each contributing biobank, performed mLOX calling, GWAS and allelic shift analysis for UKBB and MGB, and wrote the manuscript. Y.Z. performed WES analyses and three-way combined call GWAS in UKBB, generated Supplementary Figs. 2 and 5, prepared Supplementary Tables 8 and 19, and drafted the relevant Results and Methods paragraphs. M.P. developed the Bayesian line model to cluster mLOX and mLOY loci and wrote the relevant Methods paragraph. S.M.Z. performed pheWAS for UKBB and MGB. K.A.K. performed the GWAStoGenes pipeline, prepared Supplementary Table 13, and drafted the relevant Methods paragraphs. Z.Y. estimated heritability and genetic correlations and prepared Supplementary Table 16. K.Y. and L.S. performed the pathway analysis and prepared Supplementary Table 14. C.V. performed the sensitivity analyses for associations with leukaemia in UKBB and prepared Supplementary Table 7. X.L. performed mLOX calling, GWAS, allelic shift analysis and HLA fine-mapping replication analysis in BBJ. D.W.B. performed GWAS for PLCO and formatted inputs for blood cell trait heat maps (Fig. 2d and Extended Data Fig. 3b). G.H. performed mLOX calling, GWAS and allelic shift analysis for EBB. B.R.G. and S.P. performed mLOX calling, GWAS, allelic shift analysis and pheWAS for MVP. J.D. performed mLOX calling and GWAS for BCAC. W.Z. performed mLOX calling, GWAS and allelic shift analysis for PLCO. Y.M. participated in BBJ analyses. V.T. and F.-D.P. participated in EBB analyses. M.A., T.P.S. and A. Ghazal participated in FinnGen analyses. W.-Y.H. and N.D.F. participated in PLCO analyses. E.J.G. participated in UKBB WES analyses. V.G.S. assisted in interpretating findings related to clonal haematopoiesis. A.P. coordinated the FinnGen project. H.M.O. advised in the HLA fine-mapping analysis and assisted in interpretating findings related to HLA. T.T. assisted in interpretating findings related to skewed X chromosome inactivation and escape from X chromosome inactivation. S.J.C. coordinated the PLCO project. R.M. supervised EBB analyses. P.N. supervised pheWAS for UKBB, MGB and MVP. M.J.D. initialized and conceptualized the mCA project in FinnGen and assisted in interpreting findings, especially those related to mLOY in male participants. A.B. supervised pheWAS in UKBB, MGB and MVP, and the sensitivity analyses for associations with leukaemia in UKBB. S.A.M. supervised the development of MoChA pipelines. C.T. supervised BBJ analyses and advised the HLA fine-mapping analysis. P.-R.L., A. Ganna, J.R.B.P. and M.J.M. co-supervised the project, interpreted the findings and wrote the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
G.G., P.-R.L. and S.A.M. declare competing interests: patent application PCT/WO2019/079493 has been filed on the mCA detection method used in this work. J.R.B.P. and E.J.G. are employees and shareholders of Insmed. Y.Z. is a UK University Worker of GSK. A.B. reports scientific advisory board membership for TenSixteen Bio. P.N. reports personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co., Foresite Labs, Genentech/Roche, GV, HeartFlow, Magnet Biomedicine, Merck and Novartis, scientific advisory board membership of Esperion Therapeutics, Preciseli, and TenSixteen Bio, scientific co-founder of TenSixteen Bio, equity in MyOme, Preciseli and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work.
Peer review
Peer review information
Nature thanks Eric Jorgenson and Siddhartha Kar for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Theoretical framework of the mLOX study.
Panel (A) depicts the etiologic process leading to detectable mosaic loss of the X chromosome (mLOX) in females. Detectable age-related mLOX develops only if the mutant haematopoietic stem cell (HSC) survives loss of the X chromosome and the mutation confers a proliferative advantage over normal cells. Panel (B) shows the statistical approaches used to discover the genetic determinants of mLOX. Variants associated with susceptibility to mLOX, acting as either trans or cis factors, are examined using a genome-wide association study (GWAS), for common variants with minor allele frequency (MAF) > 0.1%, and a gene-burden test performed for whole-exome sequencing (WES) data for rare variants with MAF < 0.1%. Among samples with detectable mLOX, allelic shift analysis is used to detect chromosome X alleles exhibiting cis selection, that is, more likely to be clonally selected for when detectable mLOX retains these alleles.
Extended Data Fig. 2 Prevalence of mLOX by age at genotyping in each contributed biobank.
Panel (A) is for all detectable mLOX in peripheral leukocytes, while Panel (B) is restricted to expanded mLOX with cell fraction >5%. Data are presented as mean values +/− SEM.
Extended Data Fig. 3 Allelic shift of chromosome X alleles among mLOX cases.
Panel (A) shows -log10(P) of chromosome X variants from allelic shift analysis by meta-analyzing data of 83,320 mLOX cases from seven biobanks, with lead variants of 44 independent loci highlighted. The y axis is the log scale of P values from a two-sided test and the dashed line denotes the statistical significance after multiple comparison adjustments (5.0 × 10−8, which is the same as the GWAS significance level). Panel (B) is a heat map for associations of 43 allelic shift analysis lead variants with 19 blood cell phenotypes46, with significance levels from the original GWAS expressed by asterisks (*** for two-sided exact P ≤ 0.001, ** for P ≤ 0.01, * for P ≤ 0.05). One variant was dropped due to no appropriate proxy variant available in blood cell phenotype GWAS. The absolute Z scores were cropped to the range of [0−10].
Extended Data Fig. 4 Allelic shift in the context of X chromosome inactivation.
Panel (A) depicts the main mechanism of X chromosome inactivation (Xi) in females. To compensate for gene dosage imbalances between XX females and XY males, one of the two X chromosomes in females is randomly inactivated early in embryonic development and this inactivation status is passed down to daughter cells. As some females age, the expected 1:1 ratio of inactivated maternal to paternal X chromosome copies can become skewed, if cells harboring one of the active X chromosomes is more frequent than the other. Panel (B) and (C) depict the pattern of allelic shift in mLOX cases in terms of the status of Xi, with Panel (B) for random Xi and panel (C) for skewed Xi. As mLOX preferentially affects the inactivated X chromosome2, the imbalance between chromosome X alleles in mLOX cases can be seen as the combined cis effects of both skewed Xi and mLOX. In other words, the imbalance of chromosome X alleles in mLOX cases could also be shaped by alleles that have cis effects solely on the process of skewed Xi.
Extended Data Fig. 5 Contribution of each X chromosome allelic shift loci to the prediction of the retained X chromosome in females with mLOX.
We proposed a novel polygenic score including the 44 loci identified from allelic shift analysis to infer the retained X chromosome in detectable mLOX. To avoid overfitting, the effects of the 44 loci were estimated from allelic shift analysis of 56,319 mLOX cases from six biobanks excluding FinnGen while the prediction performance was tested in 27,001 FinnGen mLOX cases. The plot shows the contribution of each of the 44 loci to the prediction, starting with the most significant variants.
Supplementary information
Supplementary Information
Supplementary Figs. 1–15 and figure legends.
Supplementary Tables 1–25
Supplementary Tables 1–25 and table legends.
Supplementary Note 1
A list of all FinnGen working-group members and their affiliations.
Supplementary Note 2
A list of all Breast Cancer Association Consortium working-group members and their affiliations, funding, and acknowledgements.
Supplementary Note 3
A list of all Million Veteran Program working-group members and their affiliations.
Supplementary Note 4
Million Veteran Program: consortium acknowledgement for manuscripts.
Rights and permissions
About this article
Cite this article
Liu, A., Genovese, G., Zhao, Y. et al. Genetic drivers and cellular selection of female mosaic X chromosome loss. Nature 631, 134–141 (2024). https://doi.org/10.1038/s41586-024-07533-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-024-07533-7
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.