Abstract
An individual’s disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Patient-level EHR and genotyping data are protected due to patient privacy and can be accessed by collaboration with a UCLA researcher. All summary statistic data discussed in this paper are freely available on https://www.ibd.la/. 1000 Genomes Project data can be accessed at https://www.internationalgenome.org/data. Human Genome Diversity Project data can be accessed at the following FTP server: ftp://ngs.sanger.ac.uk/production/hgdp. Simons Genome Diversity Project data can be accessed at https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/. The human reference genome version hg38 was downloaded from the UCSC Genome Browser: https://hgdownload.soe.ucsc.edu/downloads.html. DbSNP version 147 was used in this study and was obtained from https://ftp.ncbi.nlm.nih.gov/snp/.
Code availability
Code for identity-by-descent calling and clustering is available at https://github.com/christacaggiano/IBD. Code for the website is available at https://github.com/misingnoglic/atlas-app.
References
Williams, D. R., Mohammed, S. A., Leavell, J. & Collins, C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann. N. Y. Acad. Sci. 1186, 69–101 (2010).
Fiscella, K. & Williams, D. R. Health disparities based on socioeconomic inequities: implications for urban health care. Acad. Med. 79, 1139–1147 (2004).
Geneviève, L. D., Martani, A., Shaw, D., Elger, B. S. & Wangmo, T. Structural racism in precision medicine: leaving no one behind. BMC Med. Ethics 21, 17 (2020).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584 (2019).
Majara, L. et al. Low and differential polygenic score generalizability among African populations due largely to genetic diversity. HGG Adv. 4, 100184 (2023).
All of Us Research Program Investigators. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019).
Johnson, R. et al. Leveraging genomic diversity for discovery in an EHR-linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med. 14, 104 (2021).
Hateley, S. et al. The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele. Nat. Commun. 12, 6442 (2021).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 53, 1415–1424 (2021).
Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083 (2021).
Saada, J. N. et al. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat. Commun. 11, 6130 (2020).
Dai, C. L. et al. Population histories of the United States revealed through fine-scale migration and haplotype analysis. Am. J. Hum. Genet. 106, 371–388 (2020).
Naseri, A. et al. Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments. BMC Biol. 19, 32 (2021).
Gilbert, E., Shanmugam, A. & Cavalleri, G. L. Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank. Proc. Natl Acad. Sci. USA 119, e2119281119 (2022).
Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).
Johnson, R. et al. The UCLA ATLAS Community Health Initiative: promoting precision health research in a diverse biobank. Cell Genome 3, 100243 (2022).
U.S. Census Bureau (2015–2019). Place of birth for the foreign-born population in the United States American community survey 5-year estimates. https://censusreporter.org/data/table/?table=B05006&geo_ids=05000US06037,31000US31080,04000US06,01000US,86000US91030
Krieger, N. Who and what is a ‘population’? Historical debates, current controversies, and implications for understanding ‘population health’ and rectifying health inequities. Milbank Q. 90, 634–681 (2012).
Internal Revenue Service. SOI Tax Stats - Individual Income Tax Statistics - ZIP Code Data (SOI). https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-zip-code-data-soi
U.S. Census Bureau. U.S. Census Bureau QuickFacts: Los Angeles city, California. https://www.census.gov/quickfacts/losangelescitycalifornia
Carress, H., Lawson, D. J. & Elhaik, E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 22, 351 (2021).
Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).
Shemirani, R. et al. Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat. Commun. 12, 3546 (2021).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Chiu, A. M., Molloy, E. K., Tan, Z., Talwalkar, A. & Sankararaman, S. Inferring population structure in biobank-scale genomic data. Am. J. Hum. Genet. 109, 727–737 (2022).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
García-Ortiz, H. et al. The genomic landscape of Mexican Indigenous populations brings insights into the peopling of the Americas. Nat. Commun. 12, 5942 (2021).
Parvini, S. & Simani, E. Are Arabs and Iranians white? Census says yes, but many disagree. Los Angeles Times. https://www.latimes.com/projects/la-me-census-middle-east-north-africa-race/
Naccashian, Z., Hattar-Pollara, M., Ho, C. (Alex) & Ayvazian, S. P. Prevalence and predictors of diabetes mellitus and hypertension in Armenian Americans in Los Angeles. Diabetes Educ. 44, 130–143 (2018).
Freeman, J. D., Kadiyala, S., Bell, J. F. & Martin, D. P. The causal effect of health insurance on utilization and outcomes in adults: a systematic review of US studies. Med. Care 46, 1023–1032 (2008).
Wei, W.-Q. et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE 12, e0175508 (2017).
Corriveau, R. A. et al. Alzheimer’s Disease-Related Dementias Summit 2016: national research priorities. Neurology 89, 2381–2391 (2017).
Schiff, E. R. et al. A new look at familial risk of inflammatory bowel disease in the Ashkenazi Jewish population. Dig. Dis. Sci. 63, 3049–3057 (2018).
Roth, M. P., Petersen, G. M., McElree, C., Feldman, E. & Rotter, J. I. Geographic origins of Jewish patients with inflammatory bowel disease. Gastroenterology 97, 900–904 (1989).
Levav, I., Kohn, R., Golding, J. M. & Weissman, M. M. Vulnerability of Jews to affective disorders. Am. J. Psychiatry 154, 941–947 (1997).
Pinhas, L., Heinmaa, M., Bryden, P., Bradley, S. & Toner, B. Disordered eating in Jewish adolescent girls. Can. J. Psychiatry 53, 601–608 (2008).
Yeung, P. P. & Greenwald, S. Jewish Americans and mental health: results of the NIMH Epidemiologic Catchment Area Study. Soc. Psychiatry Psychiatr. Epidemiol. 27, 292–297 (1992).
Solovieff, N. et al. Ancestry of African Americans with sickle cell disease. Blood Cells Mol. Dis. 47, 41–45 (2011).
Eltoukhi, H. M., Modi, M. N., Weston, M., Armstrong, A. Y. & Stewart, E. A. The health disparities of uterine fibroid tumors for African American women: a public health issue. Am. J. Obstet. Gynecol. 210, 194–199 (2014).
Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36, 1–48 (2010).
Centers for Disease Control and Prevention. People born outside of the United States and viral hepatitis. https://www.cdc.gov/hepatitis/populations/Born-Outside-United-States.htm (2020).
Rostomian, A. H., Soverow, J. & Sanchez, D. R. Exploring Armenian ethnicity as an independent risk factor for cardiovascular disease: findings from a prospective cohort of patients in a county hospital. JRSM Cardiovasc. Dis. 9, 2048004020956853 (2020).
Cobb, S., Bazargan, M., Assari, S., Barkley, L. & Bazargan-Hejazi, S. Emergency department utilization, hospital admissions, and office-based physician visits among under-resourced African American and Latino older adults. J. Racial Ethn. Health Disparities 10, 205–218 (2022).
Self, T. H., Chrisman, C. R., Mason, D. L. & Rumbak, M. J. Reducing emergency department visits and hospitalizations in African American and Hispanic patients with asthma: a 15-year review. J. Asthma 42, 807–812 (2005).
Bazargan, M. et al. Emergency department utilization among underserved African American older adults in South Los Angeles. Int. J. Environ. Res. Public Health 16, 1175 (2019).
Abul-Husn, N. S. et al. Exome sequencing reveals a high prevalence of BRCA1 and BRCA2 founder variants in a diverse population-based biobank. Genome Med. 12, 2 (2019).
Sohar, E., Prass, M., Heller, J. & Heller, H. Genetics of familial mediterranean fever (FMF): a disorder with recessive inheritance in non-Ashkenazi Jews and Armenians. Arch. Intern. Med. 107, 529–538 (1961).
Moradian, M. M., Sarkisian, T., Ajrapetyan, H. & Avanesian, N. Genotype–phenotype studies in a large cohort of Armenian patients with familial Mediterranean fever suggest clinical disease with heterozygous MEFV mutations. J. Hum. Genet 55, 389–393 (2010).
Carlice-dos-Reis, T. et al. Investigation of mutations in the HBB gene using the 1,000 genomes database. PLoS ONE 12, e0174637 (2017).
Kazazian, H. H., Dowling, C. E., Waber, P. G., Huang, S. & Lo, W. H. The spectrum of β-thalassemia genes in China and Southeast Asia. Blood 68, 964–966 (1986).
Xiong, F. et al. Molecular epidemiological survey of haemoglobinopathies in the Guangxi Zhuang Autonomous Region of southern China. Clin. Genet. 78, 139–148 (2010).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Grzymski, J. J. et al. Population genetic screening efficiently identifies carriers of autosomal dominant diseases. Nat. Med. 26, 1235–1239 (2020).
Damrauer, S. M. et al. Association of the V122I hereditary transthyretin amyloidosis genetic variant with heart failure among individuals of African or Hispanic/Latino ancestry. JAMA 322, 2191–2202 (2019).
Pogoryelova, O., González Coraspe, J. A., Nikolenko, N., Lochmüller, H. & Roos, A. GNE myopathy: from clinics and genetics to pathology and research strategies. Orphanet J. Rare Dis. 13, 70 (2018).
Eisenberg, I. et al. The UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase gene is mutated in recessive hereditary inclusion body myopathy. Nat. Genet. 29, 83–87 (2001).
Abul-Husn, N. S. et al. Implementing genomic screening in diverse populations. Genome Med. 13, 17 (2021).
Tadmouri, G. O. et al. Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009).
Fallahi, J. et al. Founder effect of KHDC3L, p.M1V mutation, on Iranian patients with recurrent hydatidiform moles. Iran. J. Med. Sci. 45, 118–124 (2020).
Ceballos, F. C., Joshi, P. K., Clark, D. W., Ramsay, M. & Wilson, J. F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 19, 220–234 (2018).
Lencz, T. et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl Acad. Sci. USA 104, 19942–19947 (2007).
Moreno-Grau, S. et al. Long runs of homozygosity are associated with Alzheimer’s disease. Transl. Psychiatry 11, 142 (2021).
Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015).
Belbin, G. M. et al. Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system. eLife 6, e25060 (2017).
Bhatia, G., Patterson, N. J., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013).
Chacón-Duque, J.-C. et al. Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance. Nat. Commun. 9, 5388 (2018).
Borrell, L. N. et al. Race and genetic ancestry in medicine—a time for reckoning with racism. N. Engl. J. Med. 384, 474–480 (2021).
Neblett, E. W. et al. Racism, racial resilience, and African American youth development: person-centered analysis as a tool to promote equity and justice. In Advances in Child Development and Behavior (eds Horn, S. S., Ruck, M. D. & Liben, L. S.) Vol. 51, 43–79 (JAI, 2016).
Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).
Arciero, E. et al. Fine-scale population structure and demographic history of British Pakistanis. Nat. Commun. 12, 7189 (2021).
Szpiech, Z. A. et al. Ancestry-dependent enrichment of deleterious homozygotes in runs of homozygosity. Am. J. Hum. Genet. 105, 747–762 (2019).
Yearby, R. Racial disparities in health status and access to healthcare: the continuation of inequality in the United States due to structural racism. Am. J. Econ. Sociol. 77, 1113–1152 (2018).
Clarke, J. L. Impact of pan-ethnic expanded carrier screening in improving population health outcomes: proceedings from a multi-stakeholder virtual roundtable summit, June 25, 2020. Popul. Health Manag. 24, 622–630 (2021).
Arjunan, A., Darnes, D. R., Sagaser, K. G. & Svenson, A. B. Addressing reproductive healthcare disparities through equitable carrier screening: medical racism and genetic discrimination in United States’ history highlights the needs for change in obstetrical genetics care. Societies 12, 33 (2022).
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Bailey, Z. D., Feldman, J. M. & Bassett, M. T. How structural racism works—racist policies as a root cause of U.S. racial health inequities. N. Engl. J. Med. 384, 768–773 (2021).
Panofsky, A. & Bliss, C. Ambiguity and scientific authority: population classification in genomic science. Am. Socio. Rev. 82, 59–87 (2017).
Coates, R. D., Ferber, A. L. & Brunsma, D. L. The Matrix of Race: Social Construction, Intersectionality, and Inequality. (SAGE Publications, 2021).
Bonham, V. R. RACE. National Human Genome Research Institute. https://www.genome.gov/genetics-glossary/Race
Barkan, S. Sociology: Understanding and Changing the Social World (Univ. of North Carolina Press, 2019).
Birney, E., Inouye, M., Raff, J., Rutherford, A. & Scally, A. The language of race, ethnicity, and ancestry in human genetic research. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.10041 (2021).
Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).
Mauro, M. et al. A scoping review of guidelines for the use of race, ethnicity, and ancestry reveals widespread consensus but also points of ongoing disagreement. Am. J. Hum. Genet. 109, 2110–2125 (2022).
Nuriddin, A., Mooney, G. & White, A. I. R. Reckoning with histories of medical racism and violence in the USA. Lancet 396, 949–951 (2020).
Bax, A. C., Bard, D. E., Cuffe, S. P., McKeown, R. E. & Wolraich, M. L. The association between race/ethnicity and socioeconomic factors and the diagnosis and treatment of children with attention-deficit hyperactivity disorder. J. Dev. Behav. Pediatr. 40, 81–91 (2019).
Thomas, P. et al. The association of autism diagnosis with socioeconomic status. Autism 16, 201–213 (2012).
Wise, S. K., Ghegan, M. D., Gorham, E. & Schlosser, R. J. Socioeconomic factors in the diagnosis of allergic fungal rhinosinusitis. Otolaryngol. Head Neck Surg. 138, 38–42 (2008).
Deyrup, A. & Graves, J. L. Racial biology and medical misconceptions. N. Engl. J. Med. 386, 501–503 (2022).
Martschenko, D. O. & Young, J. L. Precision medicine needs to think outside the box. Front. Genet. 13, 795992 (2022).
Suckiel, S. A. et al. GUÍA: a digital platform to facilitate result disclosure in genetic counseling. Genet. Med. 23, 942–949 (2021).
Chang, T. S. et al. Pre-existing conditions in Hispanics/Latinxs that are COVID-19 risk factors. iScience 24, 102188 (2021).
Lajonchere, C. et al. An integrated, scalable, electronic video consent process to power precision health research: large, population-based, cohort implementation and scalability study. J. Med. Internet Res. 23, e31121 (2021).
Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, s13742-015-0047–8 (2015).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Bettinger, B. The Shared cM Project 4.0 tool v4. https://dnapainter.com/tools/sharedcmv4 (2020).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Zhou, Y., Browning, S. R. & Browning, B. L. A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 106, 426–437 (2020).
Hagberg, A., Swart, P. & Chult, D. S. Exploring network structure, dynamics, and function using NetworkX. U.S. Department of Energy Office of Scientific and Technical Information. https://www.osti.gov/biblio/960616 (2008).
Slatkin, M. A population-genetic test of founder effects and implications for Ashkenazi Jewish diseases. Am. J. Hum. Genet. 75, 282–293 (2004).
Ongaro, L. et al. The genomic impact of European colonization of the Americas. Curr. Biol. 29, 3974–3986 (2019).
Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw. Pract. Exp. 21, 1129–1164 (1991).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. Proc. of the 9th Python in Science Conference. https://doi.org/10.25080/Majora-92bf1922-011 (2010).
SPA (single-page application). MDN Web Docs Glossary: definitions of web-related terms. https://developer.mozilla.org/en-US/docs/Glossary/SPA
Scott, E. M. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat. Genet. 48, 1071–1076 (2016).
Lazaridis, I. et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016).
Acknowledgements
We thank V. Kumar and M. Broudy for their expertise with the DDR. We thank A. Panofsky and A. Lewis for their helpful comments and discussions on this manuscript. We gratefully acknowledge the Institute for Precision Health, participating patients from the UCLA ATLAS Precision Health Biobank, the UCLA David Geffen School of Medicine, the UCLA Clinical and Translational Science Institute and UCLA Health. C.C. was supported by National Institutes of Health (NIH) grant F31NS122538. C.C., N.Z., D.E. and E.P. were supported by the following grants from the NIH: R01CA227237, R01ES029929, R01MH122688, U01HG009080, R01HL155024, R01HL151152, R01GM142112 and R01HG006399. C.R.G. is supported by NIH grants R01HL151152 and R01HG010297. J.A.S. and C.R.G. are supported by NIH grant U01HG011715. N.Z., E.K., C.G., V.A. and G.B. were supported by NIH grant R01HG011345. A.C. was supported by NIH grant T32HG002536 and National Science Foundation grant DGE-1829071. V.A. was supported by NIH grant DP5OD024579. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
C.C., N.Z., G.B., E.K., V.A., J.S. and C.G were involved in study design and conceptualization. C.C., R.S., D.E., E.P., A.C. and R.H. were involved in data collection and analysis. N.Z., G.B., J.M., R.S., D.T., K.P., T.C., J.S., C.G., V.A., E.K., B.B. and B.P. contributed to statistical analysis and study methodology. A.B. and C.C. developed the website. N.Z. and G.B. supervised the study. C.C. and N.Z. wrote the original manuscript draft. All authors contributed to manuscript revisions and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
C.R.G. owns stock in 23andMe, Inc. E.E.K. has received personal fees from Regeneron Pharmaceuticals, 23andMe, Allelica and Illumina; has received research funding from Allelica; and serves on the advisory boards for Encompass Biosciences, Overtone and Galateo Bio. All other authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Han Chen, Erik Rodriquez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editors: Ming Yang and Jennifer Sargent, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Principal component analysis of ATLAS and reference data.
(a) PC1-PC4 of reference data and (b) ATLAS projected onto the reference data PC’s.
Extended Data Fig. 2 ATLAS and Los Angeles demographics.
For patients who had recorded EHR demographic information, the proportion of ATLAS or the overall UCLA DDR patient population (a) recorded as each race, (b) recorded as Hispanic or Latino ethnicity, (c) and recorded as Male/Female or Other. (d) The distribution of patient age in ATLAS and the general UCLA patient population (where patients over 90 years old are censored to 90 for privacy reasons).
Extended Data Fig. 3 Sensitivity and degree centrality of clusters.
(a) The relationship between identity-by-descent called with Shapeit4 + iLASH (x-axis) and Eagle + hap-ibd (y-axis). Each dot represents the total identity-by-descent sharing between one pair of individuals. (b) The consistency between the Louvain clusters that were identified with the Shapeit4 + iLASH approach (‘original’) and Eagle + hap-ibd (‘new’) approaches. For 10,000 random pairs of individuals, we assessed whether the pair remained in the same cluster in the new approach, or vice-versa. (c) The proportion of participants in the ‘new’ clusters in each of the original clusters. (d) The degree centrality distribution (node degree divided by the max possible degree in the cluster) of selected clusters from the final round of Louvain clustering for a cluster where nearly every individual in the cluster is connected to every other member of the cluster. (e) is an example of a cluster where individuals share some connections, but on average are less connected to each other, and (f) is an example where individuals are moderately connected to each other.
Extended Data Fig. 4 FST between clusters and external reference data.
(a) FST between one set of subclusters (subclusters UCLA_3_7_*) that made up the European cluster and samples from the UKBioBank who were born outside the United Kingdom, combined with a random sample of 100 individuals born in the United Kingdom. The second set of European subclusters (subclusters UCLA_3_8_*) are shown in (b). (c) FST between the Greater Middle East Variome111 populations and UCLA clusters with Middle Eastern or Central Asian ancestry and (d) FST between modern day Middle Eastern populations112 and UCLA clusters with Middle Eastern/Central Asian ancestry. (e) FST between UKBB participants born in the Americas and subclusters that made up the Central/South American cluster. (f) FST between UKBioBank participants born in Africa or the Americas and the three Black/African American clusters. For all plots, the country with the smallest FST to the ATLAS cluster is labeled. The ATLAS cluster name the subcluster belongs to is indicated in parentheses. The brighter the color, the smaller the FST value, suggesting less differentiation between the two groups.
Extended Data Fig. 5 Cluster admixture and principal component analysis.
(a) For the 24 largest clusters, the admixture proportions inferred with SCOPE with K = 6 for 100 randomly selected individuals. If the cluster has less than 100 individuals, all individuals are shown. (b) The twenty-four largest clusters were colored on a PCA analysis where ATLAS biobank participants were projected onto principal components calculated over the reference individuals.
Extended Data Fig. 6 Mexican/Central American Subclusters.
(a) The seven subclusters were visualized using a force-directed graph, where each dot represents one individual and the color of the dot indicates which cluster that individual belongs. (b) The number of Mexican indigenous reference samples in each subcluster, colored by primary geographic region. (c) Hudson’s FST between the clusters. (d) The proportion of each subcluster preferring to speak English or Spanish. (e) The proportion of each subcluster preferring a religion in the EHR, if any. (f) The proportion of each subcluster identifying as each race in the EHR. (g) The proportion of each subcluster identifying as each ethnicity sub-category in the EHR. (h) The odds ratio of phecodes associated with membership in the Central American (n = 1998), Puerto Rican (n = 288), Afro-Caribbean (n = 39), Central Mexican (n = 2094) and Northern Mexican (n = 1115) identity-by-descent clusters. The dot represents the odds ratio and the error bar represents the standard error.
Extended Data Fig. 7 Demographics of clusters.
For each of the largest identity-by-descent clusters, the (a) distribution of median patient BMI of participants in the cluster, (b) the distribution of max patient age of participants in the cluster, (c) the proportion of the cluster that is female based on EHR demographic records, and (d) the proportion of the cluster reported to be on private or public insurance. In the box plots, the center line of the box indicates the mean, the outer edges of the box indicate the upper and lower quartiles, and the whiskers indicate the maxima and minima of the distribution.
Extended Data Fig. 8 Healthcare utilization in alternative contexts.
(a) The association between identity-by-descent cluster membership and a manually curated list of Alzheimer’s and dementia ICD codes and (b) the association between identity-by-descent cluster membership and brain MRI imaging orders. The odds ratio of whether a given phecode assignment is associated with membership in the (c) Ashkenazi Jewish (n = 5309) (d) African American (n = 1877) and (e) Mexican and Central American (n = 6075) identity-by-descent clusters versus the remaining biobank participants, in emergency room settings. Phecodes significant at FDR 5% are shown and if there are more than 30 significant associations, we plot only the top 40 with the largest absolute log odds ratio. (f) The odds ratio of patients in a given identity-by-descent cluster visiting the emergency room relative to the remaining biobank participants, after controlling for age, sex, and BMI. In each plot, the dot represents the odds ratio and the bar represents the standard error.
Extended Data Fig. 9 Fine-scale health utilization in ATLAS.
(a) For the Chinese (n = 1547), Japanese (n = 596), Filipino (n = 796), and Korean (n = 546) identity-by-descent clusters, phecodes that have significantly different odds ratios between the clusters. Error bars indicate the standard errors. (b) The odds ratio of the European identity-by-descent cluster visiting a particular specialty, assessed against all other biobank participants. Error bars represent the standard error. For 6 clusters, the proportion of that identity-by-descent cluster that visited the UCLA Health system each year in an outpatient setting receiving (c) kidney replaced by transplant, and (d) major depressive disorder.
Extended Data Fig. 10 Replication of effect sizes.
For phecodes significant in ATLAS, the log odds ratio of ATLAS (x-axis) versus the log odds ratio of BioMe (y-axis) for six ATLAS clusters (European: n = 17017, Mexican & Central American: n = 6075, Ashkenazi Jewish: n = 5039, African American: n = 1877, Filipino: n = 796, and Puerto Rican: n = 288) that were enriched for similar populations in the two biobanks (indicated by title).
Supplementary information
Supplementary Information
Supplementary Tables 1–5
Supplementary Data 1
Supplementary data on reference samples in identity-by-descent clusters and cluster-enriched pathogenic alleles
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Caggiano, C., Boudaie, A., Shemirani, R. et al. Disease risk and healthcare utilization among ancestrally diverse groups in the Los Angeles region. Nat Med 29, 1845–1856 (2023). https://doi.org/10.1038/s41591-023-02425-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-023-02425-1
This article is cited by
-
Leveraging fine-scale population structures for precision healthcare
Nature Medicine (2023)