Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Disease risk and healthcare utilization among ancestrally diverse groups in the Los Angeles region

Abstract

An individual’s disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Definitions of key phrases.
Fig. 2: An overview of the fine-scale cluster detection approach.
Fig. 3: Genetic and demographic properties of clusters.
Fig. 4: Phecode associations for selected clusters.
Fig. 5: Phecodes associated with the Armenian identity-by-descent cluster.
Fig. 6: The genetic properties of the largest identity-by-descent clusters.

Similar content being viewed by others

Data availability

Patient-level EHR and genotyping data are protected due to patient privacy and can be accessed by collaboration with a UCLA researcher. All summary statistic data discussed in this paper are freely available on https://www.ibd.la/. 1000 Genomes Project data can be accessed at https://www.internationalgenome.org/data. Human Genome Diversity Project data can be accessed at the following FTP server: ftp://ngs.sanger.ac.uk/production/hgdp. Simons Genome Diversity Project data can be accessed at https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/. The human reference genome version hg38 was downloaded from the UCSC Genome Browser: https://hgdownload.soe.ucsc.edu/downloads.html. DbSNP version 147 was used in this study and was obtained from https://ftp.ncbi.nlm.nih.gov/snp/.

Code availability

Code for identity-by-descent calling and clustering is available at https://github.com/christacaggiano/IBD. Code for the website is available at https://github.com/misingnoglic/atlas-app.

References

  1. Williams, D. R., Mohammed, S. A., Leavell, J. & Collins, C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann. N. Y. Acad. Sci. 1186, 69–101 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Fiscella, K. & Williams, D. R. Health disparities based on socioeconomic inequities: implications for urban health care. Acad. Med. 79, 1139–1147 (2004).

    Article  PubMed  Google Scholar 

  3. Geneviève, L. D., Martani, A., Shaw, D., Elger, B. S. & Wangmo, T. Structural racism in precision medicine: leaving no one behind. BMC Med. Ethics 21, 17 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Majara, L. et al. Low and differential polygenic score generalizability among African populations due largely to genetic diversity. HGG Adv. 4, 100184 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. All of Us Research Program Investigators. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Article  Google Scholar 

  7. Johnson, R. et al. Leveraging genomic diversity for discovery in an EHR-linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med. 14, 104 (2021).

  8. Hateley, S. et al. The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele. Nat. Commun. 12, 6442 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 53, 1415–1424 (2021).

  10. Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Saada, J. N. et al. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat. Commun. 11, 6130 (2020).

    Article  Google Scholar 

  12. Dai, C. L. et al. Population histories of the United States revealed through fine-scale migration and haplotype analysis. Am. J. Hum. Genet. 106, 371–388 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Naseri, A. et al. Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments. BMC Biol. 19, 32 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gilbert, E., Shanmugam, A. & Cavalleri, G. L. Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank. Proc. Natl Acad. Sci. USA 119, e2119281119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Johnson, R. et al. The UCLA ATLAS Community Health Initiative: promoting precision health research in a diverse biobank. Cell Genome 3, 100243 (2022).

  17. U.S. Census Bureau (2015–2019). Place of birth for the foreign-born population in the United States American community survey 5-year estimates. https://censusreporter.org/data/table/?table=B05006&geo_ids=05000US06037,31000US31080,04000US06,01000US,86000US91030

  18. Krieger, N. Who and what is a ‘population’? Historical debates, current controversies, and implications for understanding ‘population health’ and rectifying health inequities. Milbank Q. 90, 634–681 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Internal Revenue Service. SOI Tax Stats - Individual Income Tax Statistics - ZIP Code Data (SOI). https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-zip-code-data-soi

  20. U.S. Census Bureau. U.S. Census Bureau QuickFacts: Los Angeles city, California. https://www.census.gov/quickfacts/losangelescitycalifornia

  21. Carress, H., Lawson, D. J. & Elhaik, E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 22, 351 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  24. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Shemirani, R. et al. Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat. Commun. 12, 3546 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

    Article  Google Scholar 

  28. Chiu, A. M., Molloy, E. K., Tan, Z., Talwalkar, A. & Sankararaman, S. Inferring population structure in biobank-scale genomic data. Am. J. Hum. Genet. 109, 727–737 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. García-Ortiz, H. et al. The genomic landscape of Mexican Indigenous populations brings insights into the peopling of the Americas. Nat. Commun. 12, 5942 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Parvini, S. & Simani, E. Are Arabs and Iranians white? Census says yes, but many disagree. Los Angeles Times. https://www.latimes.com/projects/la-me-census-middle-east-north-africa-race/

  32. Naccashian, Z., Hattar-Pollara, M., Ho, C. (Alex) & Ayvazian, S. P. Prevalence and predictors of diabetes mellitus and hypertension in Armenian Americans in Los Angeles. Diabetes Educ. 44, 130–143 (2018).

    Article  PubMed  Google Scholar 

  33. Freeman, J. D., Kadiyala, S., Bell, J. F. & Martin, D. P. The causal effect of health insurance on utilization and outcomes in adults: a systematic review of US studies. Med. Care 46, 1023–1032 (2008).

    Article  PubMed  Google Scholar 

  34. Wei, W.-Q. et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE 12, e0175508 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Corriveau, R. A. et al. Alzheimer’s Disease-Related Dementias Summit 2016: national research priorities. Neurology 89, 2381–2391 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Schiff, E. R. et al. A new look at familial risk of inflammatory bowel disease in the Ashkenazi Jewish population. Dig. Dis. Sci. 63, 3049–3057 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Roth, M. P., Petersen, G. M., McElree, C., Feldman, E. & Rotter, J. I. Geographic origins of Jewish patients with inflammatory bowel disease. Gastroenterology 97, 900–904 (1989).

    Article  CAS  PubMed  Google Scholar 

  38. Levav, I., Kohn, R., Golding, J. M. & Weissman, M. M. Vulnerability of Jews to affective disorders. Am. J. Psychiatry 154, 941–947 (1997).

    Article  CAS  PubMed  Google Scholar 

  39. Pinhas, L., Heinmaa, M., Bryden, P., Bradley, S. & Toner, B. Disordered eating in Jewish adolescent girls. Can. J. Psychiatry 53, 601–608 (2008).

    Article  PubMed  Google Scholar 

  40. Yeung, P. P. & Greenwald, S. Jewish Americans and mental health: results of the NIMH Epidemiologic Catchment Area Study. Soc. Psychiatry Psychiatr. Epidemiol. 27, 292–297 (1992).

    Article  CAS  PubMed  Google Scholar 

  41. Solovieff, N. et al. Ancestry of African Americans with sickle cell disease. Blood Cells Mol. Dis. 47, 41–45 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Eltoukhi, H. M., Modi, M. N., Weston, M., Armstrong, A. Y. & Stewart, E. A. The health disparities of uterine fibroid tumors for African American women: a public health issue. Am. J. Obstet. Gynecol. 210, 194–199 (2014).

    Article  PubMed  Google Scholar 

  43. Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36, 1–48 (2010).

    Article  Google Scholar 

  44. Centers for Disease Control and Prevention. People born outside of the United States and viral hepatitis. https://www.cdc.gov/hepatitis/populations/Born-Outside-United-States.htm (2020).

  45. Rostomian, A. H., Soverow, J. & Sanchez, D. R. Exploring Armenian ethnicity as an independent risk factor for cardiovascular disease: findings from a prospective cohort of patients in a county hospital. JRSM Cardiovasc. Dis. 9, 2048004020956853 (2020).

    PubMed  PubMed Central  Google Scholar 

  46. Cobb, S., Bazargan, M., Assari, S., Barkley, L. & Bazargan-Hejazi, S. Emergency department utilization, hospital admissions, and office-based physician visits among under-resourced African American and Latino older adults. J. Racial Ethn. Health Disparities 10, 205–218 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Self, T. H., Chrisman, C. R., Mason, D. L. & Rumbak, M. J. Reducing emergency department visits and hospitalizations in African American and Hispanic patients with asthma: a 15-year review. J. Asthma 42, 807–812 (2005).

    Article  PubMed  Google Scholar 

  48. Bazargan, M. et al. Emergency department utilization among underserved African American older adults in South Los Angeles. Int. J. Environ. Res. Public Health 16, 1175 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Abul-Husn, N. S. et al. Exome sequencing reveals a high prevalence of BRCA1 and BRCA2 founder variants in a diverse population-based biobank. Genome Med. 12, 2 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Sohar, E., Prass, M., Heller, J. & Heller, H. Genetics of familial mediterranean fever (FMF): a disorder with recessive inheritance in non-Ashkenazi Jews and Armenians. Arch. Intern. Med. 107, 529–538 (1961).

    Article  Google Scholar 

  51. Moradian, M. M., Sarkisian, T., Ajrapetyan, H. & Avanesian, N. Genotype–phenotype studies in a large cohort of Armenian patients with familial Mediterranean fever suggest clinical disease with heterozygous MEFV mutations. J. Hum. Genet 55, 389–393 (2010).

    Article  CAS  PubMed  Google Scholar 

  52. Carlice-dos-Reis, T. et al. Investigation of mutations in the HBB gene using the 1,000 genomes database. PLoS ONE 12, e0174637 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Kazazian, H. H., Dowling, C. E., Waber, P. G., Huang, S. & Lo, W. H. The spectrum of β-thalassemia genes in China and Southeast Asia. Blood 68, 964–966 (1986).

    Article  PubMed  Google Scholar 

  54. Xiong, F. et al. Molecular epidemiological survey of haemoglobinopathies in the Guangxi Zhuang Autonomous Region of southern China. Clin. Genet. 78, 139–148 (2010).

    Article  CAS  PubMed  Google Scholar 

  55. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Grzymski, J. J. et al. Population genetic screening efficiently identifies carriers of autosomal dominant diseases. Nat. Med. 26, 1235–1239 (2020).

    Article  CAS  PubMed  Google Scholar 

  57. Damrauer, S. M. et al. Association of the V122I hereditary transthyretin amyloidosis genetic variant with heart failure among individuals of African or Hispanic/Latino ancestry. JAMA 322, 2191–2202 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Pogoryelova, O., González Coraspe, J. A., Nikolenko, N., Lochmüller, H. & Roos, A. GNE myopathy: from clinics and genetics to pathology and research strategies. Orphanet J. Rare Dis. 13, 70 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Eisenberg, I. et al. The UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase gene is mutated in recessive hereditary inclusion body myopathy. Nat. Genet. 29, 83–87 (2001).

    Article  CAS  PubMed  Google Scholar 

  60. Abul-Husn, N. S. et al. Implementing genomic screening in diverse populations. Genome Med. 13, 17 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Tadmouri, G. O. et al. Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Fallahi, J. et al. Founder effect of KHDC3L, p.M1V mutation, on Iranian patients with recurrent hydatidiform moles. Iran. J. Med. Sci. 45, 118–124 (2020).

    PubMed  Google Scholar 

  63. Ceballos, F. C., Joshi, P. K., Clark, D. W., Ramsay, M. & Wilson, J. F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 19, 220–234 (2018).

    Article  CAS  PubMed  Google Scholar 

  64. Lencz, T. et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl Acad. Sci. USA 104, 19942–19947 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Moreno-Grau, S. et al. Long runs of homozygosity are associated with Alzheimer’s disease. Transl. Psychiatry 11, 142 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Belbin, G. M. et al. Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system. eLife 6, e25060 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Bhatia, G., Patterson, N. J., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013).

  69. Chacón-Duque, J.-C. et al. Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance. Nat. Commun. 9, 5388 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Borrell, L. N. et al. Race and genetic ancestry in medicine—a time for reckoning with racism. N. Engl. J. Med. 384, 474–480 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Neblett, E. W. et al. Racism, racial resilience, and African American youth development: person-centered analysis as a tool to promote equity and justice. In Advances in Child Development and Behavior (eds Horn, S. S., Ruck, M. D. & Liben, L. S.) Vol. 51, 43–79 (JAI, 2016).

  72. Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Arciero, E. et al. Fine-scale population structure and demographic history of British Pakistanis. Nat. Commun. 12, 7189 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Szpiech, Z. A. et al. Ancestry-dependent enrichment of deleterious homozygotes in runs of homozygosity. Am. J. Hum. Genet. 105, 747–762 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Yearby, R. Racial disparities in health status and access to healthcare: the continuation of inequality in the United States due to structural racism. Am. J. Econ. Sociol. 77, 1113–1152 (2018).

    Article  Google Scholar 

  76. Clarke, J. L. Impact of pan-ethnic expanded carrier screening in improving population health outcomes: proceedings from a multi-stakeholder virtual roundtable summit, June 25, 2020. Popul. Health Manag. 24, 622–630 (2021).

    Article  PubMed  Google Scholar 

  77. Arjunan, A., Darnes, D. R., Sagaser, K. G. & Svenson, A. B. Addressing reproductive healthcare disparities through equitable carrier screening: medical racism and genetic discrimination in United States’ history highlights the needs for change in obstetrical genetics care. Societies 12, 33 (2022).

    Article  Google Scholar 

  78. Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Bailey, Z. D., Feldman, J. M. & Bassett, M. T. How structural racism works—racist policies as a root cause of U.S. racial health inequities. N. Engl. J. Med. 384, 768–773 (2021).

    Article  PubMed  Google Scholar 

  80. Panofsky, A. & Bliss, C. Ambiguity and scientific authority: population classification in genomic science. Am. Socio. Rev. 82, 59–87 (2017).

    Article  Google Scholar 

  81. Coates, R. D., Ferber, A. L. & Brunsma, D. L. The Matrix of Race: Social Construction, Intersectionality, and Inequality. (SAGE Publications, 2021).

  82. Bonham, V. R. RACE. National Human Genome Research Institute. https://www.genome.gov/genetics-glossary/Race

  83. Barkan, S. Sociology: Understanding and Changing the Social World (Univ. of North Carolina Press, 2019).

  84. Birney, E., Inouye, M., Raff, J., Rutherford, A. & Scally, A. The language of race, ethnicity, and ancestry in human genetic research. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.10041 (2021).

  85. Mathieson, I. & Scally, A. What is ancestry? PLoS Genet. 16, e1008624 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Mauro, M. et al. A scoping review of guidelines for the use of race, ethnicity, and ancestry reveals widespread consensus but also points of ongoing disagreement. Am. J. Hum. Genet. 109, 2110–2125 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Nuriddin, A., Mooney, G. & White, A. I. R. Reckoning with histories of medical racism and violence in the USA. Lancet 396, 949–951 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Bax, A. C., Bard, D. E., Cuffe, S. P., McKeown, R. E. & Wolraich, M. L. The association between race/ethnicity and socioeconomic factors and the diagnosis and treatment of children with attention-deficit hyperactivity disorder. J. Dev. Behav. Pediatr. 40, 81–91 (2019).

    Article  PubMed  Google Scholar 

  89. Thomas, P. et al. The association of autism diagnosis with socioeconomic status. Autism 16, 201–213 (2012).

    Article  PubMed  Google Scholar 

  90. Wise, S. K., Ghegan, M. D., Gorham, E. & Schlosser, R. J. Socioeconomic factors in the diagnosis of allergic fungal rhinosinusitis. Otolaryngol. Head Neck Surg. 138, 38–42 (2008).

    Article  PubMed  Google Scholar 

  91. Deyrup, A. & Graves, J. L. Racial biology and medical misconceptions. N. Engl. J. Med. 386, 501–503 (2022).

    Article  PubMed  Google Scholar 

  92. Martschenko, D. O. & Young, J. L. Precision medicine needs to think outside the box. Front. Genet. 13, 795992 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Suckiel, S. A. et al. GUÍA: a digital platform to facilitate result disclosure in genetic counseling. Genet. Med. 23, 942–949 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Chang, T. S. et al. Pre-existing conditions in Hispanics/Latinxs that are COVID-19 risk factors. iScience 24, 102188 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Lajonchere, C. et al. An integrated, scalable, electronic video consent process to power precision health research: large, population-based, cohort implementation and scalability study. J. Med. Internet Res. 23, e31121 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).

    Article  CAS  PubMed  Google Scholar 

  97. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, s13742-015-0047–8 (2015).

    Article  Google Scholar 

  98. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).

    Article  PubMed  Google Scholar 

  100. Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Bettinger, B. The Shared cM Project 4.0 tool v4. https://dnapainter.com/tools/sharedcmv4 (2020).

  103. Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Zhou, Y., Browning, S. R. & Browning, B. L. A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 106, 426–437 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Hagberg, A., Swart, P. & Chult, D. S. Exploring network structure, dynamics, and function using NetworkX. U.S. Department of Energy Office of Scientific and Technical Information. https://www.osti.gov/biblio/960616 (2008).

  106. Slatkin, M. A population-genetic test of founder effects and implications for Ashkenazi Jewish diseases. Am. J. Hum. Genet. 75, 282–293 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Ongaro, L. et al. The genomic impact of European colonization of the Americas. Curr. Biol. 29, 3974–3986 (2019).

    Article  CAS  PubMed  Google Scholar 

  108. Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw. Pract. Exp. 21, 1129–1164 (1991).

    Article  Google Scholar 

  109. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. Proc. of the 9th Python in Science Conference. https://doi.org/10.25080/Majora-92bf1922-011 (2010).

  110. SPA (single-page application). MDN Web Docs Glossary: definitions of web-related terms. https://developer.mozilla.org/en-US/docs/Glossary/SPA

  111. Scott, E. M. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat. Genet. 48, 1071–1076 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Lazaridis, I. et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank V. Kumar and M. Broudy for their expertise with the DDR. We thank A. Panofsky and A. Lewis for their helpful comments and discussions on this manuscript. We gratefully acknowledge the Institute for Precision Health, participating patients from the UCLA ATLAS Precision Health Biobank, the UCLA David Geffen School of Medicine, the UCLA Clinical and Translational Science Institute and UCLA Health. C.C. was supported by National Institutes of Health (NIH) grant F31NS122538. C.C., N.Z., D.E. and E.P. were supported by the following grants from the NIH: R01CA227237, R01ES029929, R01MH122688, U01HG009080, R01HL155024, R01HL151152, R01GM142112 and R01HG006399. C.R.G. is supported by NIH grants R01HL151152 and R01HG010297. J.A.S. and C.R.G. are supported by NIH grant U01HG011715. N.Z., E.K., C.G., V.A. and G.B. were supported by NIH grant R01HG011345. A.C. was supported by NIH grant T32HG002536 and National Science Foundation grant DGE-1829071. V.A. was supported by NIH grant DP5OD024579. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

C.C., N.Z., G.B., E.K., V.A., J.S. and C.G were involved in study design and conceptualization. C.C., R.S., D.E., E.P., A.C. and R.H. were involved in data collection and analysis. N.Z., G.B., J.M., R.S., D.T., K.P., T.C., J.S., C.G., V.A., E.K., B.B. and B.P. contributed to statistical analysis and study methodology. A.B. and C.C. developed the website. N.Z. and G.B. supervised the study. C.C. and N.Z. wrote the original manuscript draft. All authors contributed to manuscript revisions and approved the final manuscript.

Corresponding author

Correspondence to Noah Zaitlen.

Ethics declarations

Competing interests

C.R.G. owns stock in 23andMe, Inc. E.E.K. has received personal fees from Regeneron Pharmaceuticals, 23andMe, Allelica and Illumina; has received research funding from Allelica; and serves on the advisory boards for Encompass Biosciences, Overtone and Galateo Bio. All other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Han Chen, Erik Rodriquez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editors: Ming Yang and Jennifer Sargent, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Principal component analysis of ATLAS and reference data.

(a) PC1-PC4 of reference data and (b) ATLAS projected onto the reference data PC’s.

Extended Data Fig. 2 ATLAS and Los Angeles demographics.

For patients who had recorded EHR demographic information, the proportion of ATLAS or the overall UCLA DDR patient population (a) recorded as each race, (b) recorded as Hispanic or Latino ethnicity, (c) and recorded as Male/Female or Other. (d) The distribution of patient age in ATLAS and the general UCLA patient population (where patients over 90 years old are censored to 90 for privacy reasons).

Extended Data Fig. 3 Sensitivity and degree centrality of clusters.

(a) The relationship between identity-by-descent called with Shapeit4 + iLASH (x-axis) and Eagle + hap-ibd (y-axis). Each dot represents the total identity-by-descent sharing between one pair of individuals. (b) The consistency between the Louvain clusters that were identified with the Shapeit4 + iLASH approach (‘original’) and Eagle + hap-ibd (‘new’) approaches. For 10,000 random pairs of individuals, we assessed whether the pair remained in the same cluster in the new approach, or vice-versa. (c) The proportion of participants in the ‘new’ clusters in each of the original clusters. (d) The degree centrality distribution (node degree divided by the max possible degree in the cluster) of selected clusters from the final round of Louvain clustering for a cluster where nearly every individual in the cluster is connected to every other member of the cluster. (e) is an example of a cluster where individuals share some connections, but on average are less connected to each other, and (f) is an example where individuals are moderately connected to each other.

Extended Data Fig. 4 FST between clusters and external reference data.

(a) FST between one set of subclusters (subclusters UCLA_3_7_*) that made up the European cluster and samples from the UKBioBank who were born outside the United Kingdom, combined with a random sample of 100 individuals born in the United Kingdom. The second set of European subclusters (subclusters UCLA_3_8_*) are shown in (b). (c) FST between the Greater Middle East Variome111 populations and UCLA clusters with Middle Eastern or Central Asian ancestry and (d) FST between modern day Middle Eastern populations112 and UCLA clusters with Middle Eastern/Central Asian ancestry. (e) FST between UKBB participants born in the Americas and subclusters that made up the Central/South American cluster. (f) FST between UKBioBank participants born in Africa or the Americas and the three Black/African American clusters. For all plots, the country with the smallest FST to the ATLAS cluster is labeled. The ATLAS cluster name the subcluster belongs to is indicated in parentheses. The brighter the color, the smaller the FST value, suggesting less differentiation between the two groups.

Extended Data Fig. 5 Cluster admixture and principal component analysis.

(a) For the 24 largest clusters, the admixture proportions inferred with SCOPE with K = 6 for 100 randomly selected individuals. If the cluster has less than 100 individuals, all individuals are shown. (b) The twenty-four largest clusters were colored on a PCA analysis where ATLAS biobank participants were projected onto principal components calculated over the reference individuals.

Extended Data Fig. 6 Mexican/Central American Subclusters.

(a) The seven subclusters were visualized using a force-directed graph, where each dot represents one individual and the color of the dot indicates which cluster that individual belongs. (b) The number of Mexican indigenous reference samples in each subcluster, colored by primary geographic region. (c) Hudson’s FST between the clusters. (d) The proportion of each subcluster preferring to speak English or Spanish. (e) The proportion of each subcluster preferring a religion in the EHR, if any. (f) The proportion of each subcluster identifying as each race in the EHR. (g) The proportion of each subcluster identifying as each ethnicity sub-category in the EHR. (h) The odds ratio of phecodes associated with membership in the Central American (n = 1998), Puerto Rican (n = 288), Afro-Caribbean (n = 39), Central Mexican (n = 2094) and Northern Mexican (n = 1115) identity-by-descent clusters. The dot represents the odds ratio and the error bar represents the standard error.

Extended Data Fig. 7 Demographics of clusters.

For each of the largest identity-by-descent clusters, the (a) distribution of median patient BMI of participants in the cluster, (b) the distribution of max patient age of participants in the cluster, (c) the proportion of the cluster that is female based on EHR demographic records, and (d) the proportion of the cluster reported to be on private or public insurance. In the box plots, the center line of the box indicates the mean, the outer edges of the box indicate the upper and lower quartiles, and the whiskers indicate the maxima and minima of the distribution.

Extended Data Fig. 8 Healthcare utilization in alternative contexts.

(a) The association between identity-by-descent cluster membership and a manually curated list of Alzheimer’s and dementia ICD codes and (b) the association between identity-by-descent cluster membership and brain MRI imaging orders. The odds ratio of whether a given phecode assignment is associated with membership in the (c) Ashkenazi Jewish (n = 5309) (d) African American (n = 1877) and (e) Mexican and Central American (n = 6075) identity-by-descent clusters versus the remaining biobank participants, in emergency room settings. Phecodes significant at FDR 5% are shown and if there are more than 30 significant associations, we plot only the top 40 with the largest absolute log odds ratio. (f) The odds ratio of patients in a given identity-by-descent cluster visiting the emergency room relative to the remaining biobank participants, after controlling for age, sex, and BMI. In each plot, the dot represents the odds ratio and the bar represents the standard error.

Extended Data Fig. 9 Fine-scale health utilization in ATLAS.

(a) For the Chinese (n = 1547), Japanese (n = 596), Filipino (n = 796), and Korean (n = 546) identity-by-descent clusters, phecodes that have significantly different odds ratios between the clusters. Error bars indicate the standard errors. (b) The odds ratio of the European identity-by-descent cluster visiting a particular specialty, assessed against all other biobank participants. Error bars represent the standard error. For 6 clusters, the proportion of that identity-by-descent cluster that visited the UCLA Health system each year in an outpatient setting receiving (c) kidney replaced by transplant, and (d) major depressive disorder.

Extended Data Fig. 10 Replication of effect sizes.

For phecodes significant in ATLAS, the log odds ratio of ATLAS (x-axis) versus the log odds ratio of BioMe (y-axis) for six ATLAS clusters (European: n = 17017, Mexican & Central American: n = 6075, Ashkenazi Jewish: n = 5039, African American: n = 1877, Filipino: n = 796, and Puerto Rican: n = 288) that were enriched for similar populations in the two biobanks (indicated by title).

Supplementary information

Supplementary Information

Supplementary Tables 1–5

Reporting Summary

Supplementary Data 1

Supplementary data on reference samples in identity-by-descent clusters and cluster-enriched pathogenic alleles

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Caggiano, C., Boudaie, A., Shemirani, R. et al. Disease risk and healthcare utilization among ancestrally diverse groups in the Los Angeles region. Nat Med 29, 1845–1856 (2023). https://doi.org/10.1038/s41591-023-02425-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-023-02425-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing