Genetic ancestry and diagnostic yield of exome sequencing in a diverse population

It has been suggested that diagnostic yield (DY) from Exome Sequencing (ES) may be lower among patients with non-European ancestries than those with European ancestry. We examined the association of DY with estimated continental/subcontinental genetic ancestry in a racially/ethnically diverse pediatric and prenatal clinical cohort. Cases (N = 845) with suspected genetic disorders underwent ES for diagnosis. Continental/subcontinental genetic ancestry proportions were estimated from the ES data. We compared the distribution of genetic ancestries in positive, negative, and inconclusive cases by Kolmogorov–Smirnov tests and linear associations of ancestry with DY by Cochran-Armitage trend tests. We observed no reduction in overall DY associated with any genetic ancestry (African, Native American, East Asian, European, Middle Eastern, South Asian). However, we observed a relative increase in proportion of autosomal recessive homozygous inheritance versus other inheritance patterns associated with Middle Eastern and South Asian ancestry, due to consanguinity. In this empirical study of ES for undiagnosed pediatric and prenatal genetic conditions, genetic ancestry was not associated with the likelihood of a positive diagnosis, supporting the equitable use of ES in diagnosis of previously undiagnosed but potentially Mendelian disorders across all ancestral populations.


INTRODUCTION
Advances in exome sequencing (ES) technology have led to use of ES in establishing molecular diagnoses for Mendelian diseases in children and adults.This has prompted recommendations for ES as the first line genetic test for certain clinical indications such as neurodevelopmental disorders 1 .The probability of a positive case classification from ES (diagnostic yield) may differ due to factors such as: number of parents sequenced with proband, parental age, variant of uncertain significance (VUS) calling threshold, consanguinity, clinical indication or phenotype presentation, sex and age of proband, genetic ancestry, or a combination of these factors.Most of the studies on ES diagnostic yield have been conducted in predominantly European ancestry populations 2 .
Relatively little is known about the diagnostic yield (DY) from ES in individuals with ancestry such as African, East Asian, South/ Central Asian, Middle Eastern, Native American, as well as ancestrally admixed individuals 3 .Genetic variant data from individuals with non-European ancestry is less well represented in genetic and genomic databases 2 , and it has been suggested that DY may be lower in those with non-European ancestry.Some have found higher rates of VUSs in individuals with African, and Native American compared to those of European ancestry 4,5 , which suggests the potential for reduced diagnostic yield in non-European ancestry populations.
To investigate this question in the context of ES for rare undiagnosed but suspected Mendelian disorders, we analyzed the association of diagnostic yield with estimated global genetic ancestry in an ancestrally diverse cohort of pediatric and prenatal cases who underwent ES, and how it relates to the self-identified race/ethnicity of the parents of the cases.
Our analysis was based in the Program in Pediatric and Prenatal Genomic Sequencing (P 3 EGS) cohort at the University of California, San Francisco (UCSF), which is part of the Clinical Sequencing Evidence-generating Research (CSER) consortium 6 .Cases in the P 3 EGS cohort had a wide range of clinical indications for ES, and was ancestrally diverse, with 70% of parents providing race/ ethnicity information self-identifying as non-white 7 .
The association between diagnostic yield and important factors other than genetic ancestry has been reported previously, in a separate but related study, using the same cohort 7 .This work extends the results from that study, by specifically investigating genetic ancestry estimated from sequence data in relation to diagnostic yield.

Participant demographics and exome sequencing
A total of 845 (529 pediatric, 316 prenatal) cases and their available biological parents were enrolled in the study primarily at one of five sites in the San Francisco Bay area and Central Valley of California (2 pediatric and 58 prenatal families were referred from outside California).Participants in the cohort had a wide range of clinical indications for ES 7 .There were more male (54.8% pediatric, 54.1% prenatal) than female cases in the cohort.Overall, 16.3% of pediatric cases were less than a year old, and 76.6% were 10 years or younger at enrollment.The median maternal and paternal age at proband conception in the pediatric cohort was 28.2 and 32.2 years, respectively.Among prenatal patients, the mean gestational age at enrollment was 23.5 weeks.The median maternal and paternal age at proband conception in the prenatal cohort was 33.1 and 35.0 years, respectively.
All 845 cases received ES.Among pediatric cases, ES was done on both parents in 337 cases (trio, quad ES), a single parent on 111 cases (duo ES), and neither parent of 81 cases.Among prenatal cases, ES was done on both parents of 262 cases (trio, quad, quint ES), one parent of 16 cases (duo ES), and neither parent of 38 cases, yielding a total of 1325 parents with ES data.
See ref. 7 for more details on the individuals studied and their demographics.
Race/ethnicity and genetic ancestry of P 3
Results of PC analysis are given in Supplementary Figs.1-3.The first 6 PCs depict African, European, East Asian, Native American, South Asian, Middle Eastern and Pacific Islander genetic ancestries.The P 3 EGS cases reflect all these ancestries, with the largest components being European, Native American, and East Asian.
The correspondence between self-identified race/ethnicity and estimated individual genetic ancestry proportions of the 1325 exome sequenced parents is visualized in Fig. 1.This includes those whose race/ethnicity information was missing.As shown previously 8 , for those reporting a single race/ethnicity there is a high correspondence between genetic ancestry and self-reported race/ethnicity (Fig. 1a).For example, those reporting East Asian race/ethnicity have near 100% East Asian genetic ancestry; the same is true for those reporting South Asian, white/European, and Middle Eastern race/ethnicity.Those reporting African American or Black race/ethnicity have admixed African and European genetic ancestry, while Latino(a) participants have primarily Native American and European genetic ancestry, with a modest contribution of African and Middle Eastern ancestry.The genetic ancestry of Central Asians appears to be intermediate between South Asian and European/Middle Eastern.The genetic ancestry distribution of those with missing race/ethnicity appears quite comparable to the overall distribution of those with information, reflecting largely European genetic ancestry, mixed European/ Native American genetic ancestry, East Asian, South Asian and African genetic ancestry.Parents who reported more than 1 race/ ethnicity had a higher level of genetic admixture compared to those who reported only 1, and again there is a high correspondence between the self-reported race/ethnicities and genetic admixture for these participants (Fig. 1b).The single exception is for those reporting Native American and white/ European race/ethnicity.The majority of such participants have only European genetic ancestry, while the remainder are admixed European with a modest to moderate amount of Native American genetic ancestry.This observation is comparable to what has been reported previously 8 .The average genetic ancestry proportions in the pediatric cases were: 41.6% European, 28.9% Native American, 7.2% East Asian, 8.6% Middle Eastern, 7.3% African, and 6.2% South Asian.The average genetic ancestry proportions in the prenatal cases were: 56.8% European, 10.6% Native American, 12.9% East Asian, 7.8% Middle Eastern, 4.5% African, and 7.2% South Asian.Combined, the estimated genetic ancestry proportions (mean, standard deviation) were: 47.3% (33.9%)European, 22.1% (27.6%)Native American, 9.3% (25.6%)East Asian, 8.3% (14.5%)Middle Eastern, 6.2% (16.5%)African, and 6.6% (21.3%)South Asian.The average estimated Oceanian genetic ancestry was less than 1% in both pediatric and prenatal cases, so we did not include it in subsequent analyses.

Genetic ancestry and diagnostic yield
The diagnostic yield was significantly higher in pediatric compared to prenatal cases 7 .Overall, out of 529 pediatric The majority of the positive cases had an AD mode of inheritance: 70% and 65% in the pediatric and prenatal arms of the study, respectively, compared to 18% and 25%, respectively, that had AR inheritance.Compared to the positive cases, the inconclusive cases had a lower percentage that were of AD         inheritance (41.1% pediatric, 60% prenatal) and a higher proportion of AR inheritance (45% pediatric, 30% prenatal).
For each of the six genetic ancestries, there was no statistically significant difference in genetic ancestry distributions between positive, negative, and inconclusive outcomes in both pediatric and prenatal cases (Fig. 2): P values from Kolmogorov-Smirnov tests comparing genetic ancestry distributions in positive vs negative and inconclusive vs negative cases within each genetic ancestry group were all greater than 0.1 and not statistically significant.
The distribution of estimated genetic ancestries of probands was observed to be both continuous and discrete in the different genetic ancestry groups.For Native American and European ancestries, the estimated genetic ancestries were more continuous, and for East Asian, South Asian, African, Middle Eastern ancestries, the distribution of estimated genetic ancestries were more discrete.A clear example can be seen in Fig. 2, in the estimated East Asian ancestry panel, in which the groups of cases are clumped around the 0, 25, 50, 75, 100% estimated East Asian ancestry mark, with few cases in between.These percentages represent the number of grandparents from that ancestral population (0, 1, 2, 3 or 4).For that reason, genetic ancestry bins, containing frequency of cases (and their diagnostic yield) within estimated genetic ancestry ranges were made to best capture variation in diagnostic yield in estimated genetic ancestries (see 'Methods').
By the Cochran-Armitage test, there was no significant association between any genetic ancestry and diagnostic yield (Table 2).However, Middle Eastern genetic ancestry was significantly positively associated with an inconclusive outcome among pediatric probands, largely driven by the 78% (7/9) inconclusive rate in the highest Middle Eastern ancestry bin, compared to 11% (1/9) among negative cases (Table 2).However, this association was not observed in prenatal cases (Table 2).
The results of the logistic regression analyses largely mirrored the Cochran-Armitage test results (Supplementary Table 1).The coefficient of the indicator variable for prenatal vs pediatric ranged from −0.187 to −0.201 reflecting a diagnostic yield ratio of 0.62-0.65 for the prenatal versus pediatric cases.None of the beta coefficients for genetic ancestry was statistically significant.To gauge the power of the ancestry tests, we also calculated a 95% confidence interval for each ancestry beta from its mean and standard error.For each ancestry, using the model parameters, we calculated the probability of a positive outcome at 0% ancestry (P0) and then at 100% ancestry with the lower and upper 95% CI beta values (PL and PU, respectively).We then calculated the ratios Ratio-Lower = PL/P0 and Ratio-Upper = PU/P0 (final two columns of Supplementary Table 1).For each ancestry, the range of Ratio-Lower to Ratio-Upper is broad and includes 1.However, the range is broadest for African and Middle Eastern genetic ancestry.The ranges reflect the standard errors of beta, which are largest for African and Middle Eastern genetic ancestry, smallest for European and Native American genetic ancestry and intermediate for East and South Asian genetic ancestries.These standard errors (and hence power) are also a direct reflection of the observed variance in the genetic ancestries (mentioned above), which are largest for European and Native American, smallest for African and Middle Eastern, and intermediate for East and South Asian genetic ancestries.

Genetic ancestry and diagnostic yield stratified by mode of inheritance and inheritance pattern
Similarly, there was no statistically significant reduction in positive cases compared to negative cases associated with any estimated genetic ancestry in pediatric or prenatal cases, when the positive cases were stratified by mode of inheritance (AD, AR, XL) (Table 2).In contrast, there was a significant association of East Asian genetic ancestry with XL inheritance among inconclusive prenatal cases.However, this was due entirely to 2 inconclusive cases of XL inheritance in the highest bin of East Asian ancestry, whereas no such association was observed in pediatric inconclusive cases.There was also a statistically significant association between estimated Middle Eastern ancestry and AR inheritance among pediatric inconclusive cases, and a similar trend in this direction in the prenatal inconclusive cases, although numbers were quite small.
We further broke down the AR cases into homozygotes and compound heterozygotes (Supplementary Table 2, Supplementary Table 3).The association of inconclusive pediatric AR outcomes with Middle Eastern ancestry was observed only among homozygous outcomes, and not compound heterozygotes (Supplementary Table 2).Among prenatal cases, we again saw a positive association of Middle Eastern ancestry with both positive and inconclusive homozygous AR outcomes.A similar pattern was observed with South Asian ancestry.In the pediatric cases, South Asian ancestry was positively associated with both positive and inconclusive homozygous AR outcomes (Supplementary Table 2) but only a modest positive trend in the prenatal cases (Supplementary Table 3).

Consanguinity coefficient and diagnostic yield
Overall, 14.3% of the 845 total cases had an F > = 0.0156, the level for offspring of second cousins (Supplementary Fig. 4).Both positive and inconclusive AR (homozygous) cases were associated with an increased consanguinity coefficient (F) among the combined pediatric and prenatal cases (Table 3).There was a statistically significant increase in mean F in AR (homozygous) outcomes among both positive and inconclusive cases compared to negative cases by unpaired t test (P value < 0.0042).

Consanguinity coefficient and race/ethnicity
We examined the AR homozygous positive and inconclusive cases by self-reported race/ethnicity of the parents and consanguinity coefficient of the proband, as well as the variant frequencies.Among 18 positive cases, 10 had consanguinity coefficients greater than 0.0156 (average of 0.075, minimum 0.022, maximum 0.171).For 3 of these cases the parents were South Asian, in two cases the parents were Central Asian, in 2 cases the parents were Middle Eastern, in 2 cases the parents were Latino(a) and in one case the parents were East Asian race/ethnicity.Among 8 cases with consanguinity coefficients less than 0.0156 (average of 0.003), 5 had parents that were Latino(a), and one each were Central Asian, African American or Black, and white/European race/ ethnicities.For all cases, the frequencies of P and LP variants estimated from gnomAD (based on the genetic ancestry estimates of the proband) were uniformly low; all were below 0.0001 except for a non-consanguineous Central Asian (0.00014) and nonconsanguineous African American or Black (0.00041) case.It is notable that among the 10 cases with consanguinity, 7 were Middle Eastern, Central or South Asian, while among the 8 cases without consanguinity, 1 case was Middle Eastern, Central or South Asian race/ethnicities.Among 27 AR homozygous inconclusive cases (with VUSs), 17 had consanguinity coefficients greater than .0156(average of 0.081, minimum 0.023, maximum 0.21).In this group, for 8 cases the parents were Latino(a), in 5 cases the parents were Middle Eastern, in 2 cases each the parents were Central Asian and South Asian, and in one case the parents were East Asian race/ ethnicity.In contrast, among 10 cases with consanguinity coefficients less than 0.0156 (average of −0.005), for 6 the parents were Latino(a), for 2 the parents were South Asian, and in one each the parents were white/European and African American or Black race/ethnicities.Here again it is notable that among the 17 cases with consanguinity, 9 had parents who identified as Middle Eastern, Central Asian, or South Asian, while among the 10 cases without consanguinity, the parents identified with these racial/ethnic groups for only 2. For these inconclusive AR homozygous cases, the P/LP allele frequencies were again all below 0.0001 in frequency except in 2 cases, one East Asian consanguineous case with allele frequency 0.0032, and one Latino(a) non-consanguineous case with allele frequency 0.00023.These variants represent ancestry-specific founder variants.

Recurrent variants in the P 3 EGS cohort
In searching for possible founder variants, we found four recurrent variants in three different genes among eight different cases (Supplementary Table 4).Of note, the recurrent variants were all de novo, and therefore do not represent founder variants.

DISCUSSION
Among both pediatric and prenatal cases, we observed no reduction in overall diagnostic yield (definitive+ probable positive) from ES associated with any of the estimated genetic ancestry groups (African, Native American, East Asian, European, Middle Eastern, South Asian).Similarly, there was no reduction or increase in the rate of inconclusive outcomes associated with any of the genetic ancestries, with the single exception of a positive association with Middle Eastern genetic ancestry.Of 9 pediatric cases with primarily (> 87.5%) Middle Eastern genetic ancestry, 7 (78%) had an inconclusive result, including 2 AD, 4 AR and 1 XL, compared to 12% for the rest of the cohort.There were 4 prenatal cases with majority Middle Eastern genetic ancestry; 1 of these had an inconclusive result (AR).
The mode of inheritance distribution also differed significantly between positive and inconclusive outcomes, with a higher proportion of AD de novo results for positive versus inconclusive cases 7 , likely a direct reflection of the American College of Medical Genetic and Genomics (ACMG) criteria, for which de novo status of a variant is considered a primary criterion for pathogenicity determination.Most of our cases that were classified as inconclusive were due to variant uncertainty 7 , and the majority of these VUSs were inherited variants or inheritance uncertain.We also observed a shift in mode of inheritance by genetic ancestry among our cases.AR homozygous inheritance was positively associated with Middle Eastern and South Asian genetic ancestry among both positive and inconclusive pediatric and prenatal cases.We also showed that these trends were largely due to consanguinity associated with these ancestries.Thus, while the overall diagnostic yield was not diminished in any non-European genetic ancestry, the pattern of inheritance varied.And the sole positive association of the inconclusive rate with Middle East genetic ancestry was largely attributable to 5 AR homozygous cases.
Some studies have suggested that diagnostic yield from ES and other genetic tests is lower in non-white race/ethnicity groups, such as African American, or Native American 3,9 possibly due to underrepresentation of data from non-white populations in genetic variant databases 2,3,10 .However, the clinical context is important in evaluating the association of race/ethnicity or genetic ancestry with diagnostic yield.For example, in genetic testing studies of hearing loss in which children underwent comprehensive genetic testing (CGT) and panel testing, Hispanic/ Latino(a) and African American children were less likely to have a definitive genetic diagnosis compared to white or Asian children 5,11 .This was due to the fact that likely causal variants in the African American and Latino(a) children had not yet been documented in prior studies (and therefore also not appearing on variant-specific panels), in contrast to some of the more common causal variants found in white and Asian children.When the authors reduced the ACMG criterion of prior association with disease and solely used in silico functional prediction, there was no difference in diagnostic yield by ancestry.It appears that the requirement for prior evidence regarding a specific variant (as opposed to predicted functional evidence) can have a significant impact on diagnostic yield; an example from newborn screening demonstrated a reduction of diagnostic yield from 88 to 55% when requiring prior curation of a variant as P/LP as opposed to functional prediction with no prior curation, yet without a dramatic effect on false positive rate (increase from 0.6 to 1.6%) 12 .It is therefore important to consider the role of prior evidence of pathogenicity or likely pathogenicity for a variant in assessing genetic ancestry influences on diagnostic yield, as lack of inclusion of some ancestral groups in clinical genetic studies may lead to underrepresentation of ancestry-specific pathogenic founder mutations in clinical variant databases.
In our study, cases were selected with a broad range of clinical phenotypes, with no prior assumptions about potential mode of inheritance.The majority of our positively diagnosed pediatric and prenatal cases were due to P/LP variants in AD genes (69%), and the majority of the variants were de novo (74% confirmed but possibly as high as 87% due to inheritance uncertainty).All of our XL cases were also dominant, and the majority arose de novo 7 .By contrast, 9% of the positive cases were due to inherited AD variants, and 20% had AR variants, nearly all of which were inherited.As expected, we observed no genetic ancestry association with de novo variants as these presumably occur independently of an individual's genetic ancestry.However, we also saw no genetic ancestry associations in the inherited AD or AR cases.This was largely a reflection that nearly all variants were quite rare (frequency <0.0001), and, with few exceptions, likely did not reflect founder mutations in any of the conditions or groups studied.The possible exceptions are variants observed in AR homozygous cases with low consanguinity coefficients as well as AR compound heterozygotes.Of note, we found no genetic ancestry associations or even trends for the pediatric or prenatal AR compound heterozygous cases.On the other hand, we did observe an excess of Native American genetic ancestry among 9 AR homozygotes without consanguinity, reflecting that 6 of the 9 had parents who self-reported Latino(a) race/ethnicity, and suggesting the possibility of founder variants in some Native American indigenous populations.
Among the inconclusive cases, the proportion of inherited cases is substantially higher at 58% (53/92).Yet here also, we found no association with any of the genetic ancestries tested for the inherited AD and compound heterozygous AR cases.Again, this suggests that while variant uncertainty may have led to this collection of outcomes, there was no bias towards non-European ancestries, likely because of the lack of elevated frequency of founder variants underlying the disorders identified.In the entire cohort, we identified only one P/LP/VUS variant with increased frequency-the AR VUS c.636 G > C (p.[Gln212His]) (rs201590882) in ARMC9 in an East Asian case (gnomAD frequency of 0.003 in East Asians).Furthermore, the four variants found twice among our cases were all de novo and not inherited.
The increased AR homozygous inheritance cases in high Middle Eastern and South Asian genetic ancestry pediatric cases, corresponding with statistically increased estimated consanguinity coefficients, was expected.It is well documented that certain population groups such as those from the Middle East, and South and Central Asia, have increased F, which increases autozygosity and hence the rate of AR homozygous cases 13,14 .
We also note that our diagnostic yield and results on ancestry are a direct reflection of the clinical setting of rare, undiagnosed diseases and implementation of the ACMG criteria for variant annotation, as well as our inclusion/exclusion criteria.Our inclusion criteria required a prior negative microarray, and all patients with a prior positive genetic test (e.g., from a gene-based panel) were also excluded.Thus, the diagnostic yield of 26.7% for our pediatric cases may be lower than other studies with different diagnostic and inclusion/exclusion criteria but comparable to others with similar criteria.
The ACMG criteria place a special emphasis on de novo inheritance, leading to a higher proportion of de novo AD variants in positive cases compared to inconclusive cases in our study.While there was a lack of founder variants underlying the genetic etiology of the cases in our study, this phenomenon may not be general-for example in the study of known predominantly AR diseases (such as hearing loss or inborn errors of metabolism), where ancestry associations may still be present depending on variant annotation requirements.Thus, our results should not necessarily be considered representative of all clinical testing scenarios.Indeed, our results contrast with other scenarios, such as gene panels and polygenic risk scores, which typically involve more common and genetic-ancestry-specific variants, where the impact of detection biases favoring European as opposed to other genetic ancestries has been well documented 5,15 .
In summary, in this ancestrally diverse cohort of pediatric and prenatal cases with different clinical indications, there was no reduction in diagnostic yield associated with any genetic ancestry group.Consanguinity may increase the relative proportion of cases with AR homozygous inheritance among those with Middle Eastern and South Asian genetic ancestry but did not alter the overall diagnostic yield, although our number of cases with these ancestries was modest.This empirical study improves our understanding and provides support for the equitable use of exome sequencing in diagnosis of previously undiagnosed but potentially Mendelian disorders across all ancestral populations.

Study participants, recruitment, demographics, inclusion, and exclusion criteria
Pediatric (N = 529) and prenatal (N = 316) cases and their available biological parents (at least one required) were primarily enrolled at one of five sites in the San Francisco Bay area and Central Valley of California.The five sites included UCSF Benioff Children's Hospital San Francisco and Benioff Children's Hospital Oakland, Zuckerberg San Francisco General Hospital, the Betty Irene Moore Women's Hospital at Mission Bay, and the Community Medical Center in Fresno.
The study was approved by the UCSF Institutional Review Board (IRB) (protocols 17-22504 and 17-22420), the Fresno Community Medical Center IRB (protocol 2019024), and was registered as two clinical trials ("Clinical Utility of Pediatric Whole Exome Sequencing", NCT03525431 and "Clinical Utility of Prenatal Whole Exome Sequencing", NCT03482141).Written informed consent was provided by adult participants >18 years of age, or by parents or legal guardians on behalf of their children <18 years of age or >18 years of age who were unable to consent independently.Assent was obtained from minors and intellectually disabled adults whenever possible.This study complied with all relevant ethical regulations including the Declaration of Helsinki.The study period was from 8/1/2017 to 5/13/2022.
ES was offered to patients seen in clinic for whom a genetic etiology was suspected based on clinical findings.A minimum of one biological parent was required to be available and willing to provide a biospecimen for ES, with a preference for two available parents.It was required that least one parent consented to ES of the child.For the prenatal cases, at least the mother had to consent to ES of a fetal sample as well as on herself.Pediatric patients were enrolled with the following indications: multiple congenital anomalies (MCAs), developmental delays (DD)/ intellectual disability (ID), metabolic disease, epilepsy, seizures, neurodegenerative disease/cerebral palsy (CP), and encephalopathy.Pediatric patients must have had at least one prior genetics appointment or evaluation.Almost all pediatric patients were resident in California and were likely to have had non-diagnostic newborn screening prior to enrollment.Specific community outreach efforts for patient recruitment were not required for the pediatric patients, as the patient population seen at the Benioff Children's Hospitals in San Francisco and Oakland was diverse.
Pregnant women with fetuses with structural birth defects identified by ultrasound were enrolled.The prenatal eligibility criteria included: one or more fetal structural abnormalities, an unexplained disorder of fetal growth, and one or more fetal effusions or non-immune hydrops.This was based on imaging at the time of enrollment.All prenatal cases had to have undergone prenatal diagnosis with non-diagnostic chromosomal microarray.Pregnant patients late in gestation, in whom ES results were not anticipated until after delivery, were included in the prenatal subgroup if consent occurred prior to delivery.Twin gestations were eligible for inclusion if one or both fetuses were affected.
As an exclusion criterion, patients with a diagnosis that explained their clinical findings after microarray were excluded from the study.Microarrays were ordered for patients with multiple anomalies, DD/ID, and/or autism prior to study entry.Microarrays were also ordered for growth delays, including short stature, failure to thrive or microcephaly, and neurological findings such as hypotonia and seizures.A modification of the guidelines of Manning et al. 16 was used for the microarrays ordered.Pregnancies and patients with a copy number variant not clearly associated with the phenotype were eligible for inclusion, as were patients who had previously undergone targeted or gene panel testing without a diagnosis.Patients were excluded from the study if both biological parents were unavailable or if prior ES was performed for a clinical or research indication.Patient recruitment, inclusion and exclusion has also been described in ref. 7 .
Self-reported race/ethnicity/nationality of parents Parents of affected probands voluntarily responded to questions about their demographic background on a structured instrument.In terms of race/ethnicity/nationality, the P 3 EGS parents were asked to respond to all categories that best describe them among: (a) American Indian, Native American or Alaska Native, (b) Asian-Filipino, (c) Asian-Central/South Asian (Indian, Pakistani, Afghani), (d) Asian-Vietnamese, (e) Asian-Hmong, (f) Asian-Korean, (g) Asian-Japanese, (h) Asian-other (specified through free text), (i) Black or African American, (j) Native Hawaiian, (k) Samoan, (l) Other Pacific Islander (specified through free text), (m) white or European American, (n) Middle Eastern or North African/Mediterranean, (o) Hispanic/Latino(a) -Mexican, Mexican American, Chicano/a, (p) Hispanic/Latino(a) -Central American -Guatemala, El Salvador, etc., (q) Hispanic/Latino(a) -South American -Peru, Chile, etc., (r) Hispanic/Latino(a) -Caribbean -Puerto Rico, Cuba, etc., (s) Hispanic/Latino(a)another Hispanic or Latino origin (specified by free text), (t) Prefer not to answer (u) Unknown/none of these fully describe them.They also responded to the open-ended questions "What is your ancestry or ethnic origin?" and "What country were you born in?" Based on the parental responses to the demographic questionnaires, we derived the following categories (based primarily on the selected pre-listed categories above, and further resolved using the open-ended questions): Native American (NAT) -based on category (a); Latino(a) (LT)-based on categories (o-s) which were rolled up; white/European (EU)-based on category m); African American or Black (AF)-based on category (i); East Asian (EA)-based on categories (b, d-g) which were rolled up; South Asian (SA) and Central Asian (CA)-by separating category (c) into SA and CA based on information from the open-ended questions on ancestry and country of origin; Middle Eastern (ME)-based on category (n) and Pacific Islander (PI)-based on categories (j-l) which were rolled up.The open-ended questions were also used to resolve category (h) into EA, SA, or CA.Each of the parents was placed in one or more of the categories or "missing" if no information was provided.
We only included self-reported race/ethnicity categories for parents, as no self-report information is available for children or fetuses, and parents did not assign race/ethnicity categories to their offspring.
Exome sequencing, quality control and selection of markers for genetic ancestry analyses ES of samples from the probands and available parents was done at UCSF, in a Clinical Laboratory Improvement Amendments (CLIA) licensed laboratory, the UCSF Clinical Cancer Genomics Laboratory (CLIA number: 05D2034158).Written, informed consent was obtained for study participation.Initially, ES was provided to probands and both biological parents if both parents were available.Duo ES was provided in cases where only one biological parent was available.However, in the last year of enrollment, a 'proband first' approach was used, and biological parents only underwent targeted Sanger sequencing if segregation analysis was required 7 .
Exon regions were targeted using the xGen Whole Exome Panel kit from Integrated DNA Technologies.The targeted regions were sequenced using the Illumina HiSeq 2500 sequencing system (v3 chemistry) with 100 bp paired end reads in rapid run mode.The DNA sequences were aligned to the reference published human genome GRch37 (see ref. 7 for full methods).
Variants in all VCF files with sequencing depth at or below 10 (DP < = 10), and genotype quality equal to or less than 20 (GQ < = 20) were filtered out using GATK 17 .The VCF files were then lifted over from human genome reference version GRCh37 to GRCh38 using the Picard tool in GATK suite of tools 17 .Human Genome Diversity Panel (HGDP) whole genome sequencing samples from the GnomAD V3 call set 18,19 were used as the reference for genetic ancestry and admixture estimation (N = 829 unrelated individuals).The HGDP samples were all mapped to the GRCh38 reference sequence.
High-performance markers were selected from the HGDP and P 3 EGS data for downstream genetic ancestry, admixture, relatedness, and consanguinity analysis using the following criteria: 1) Restriction of markers in the HGDP dataset to exome sequenced regions in the P 3 EGS dataset.This was conducted using bcftools 20 .
3) Only biallelic, autosomal SNPs, with a call rate >95% in exome regions that were sequenced were selected (This was done in both HGDP and P 3 EGS cohorts separately).The resulting markers in the HGDP cohort (N = 105,956) were intersected with markers from the P 3 EGS cohort sample VCFs, resulting in N = 95,173

Fig. 2
Fig.2Empirical cumulative distribution functions (ECDF) and their corresponding 95% confidence interval (C.I.) bands of estimated genetic ancestries stratified by case outcome-positive, inconclusive, and negative, and by pediatric or prenatal cases.a ECDF for African ancestry.b ECDF for Native American ancestry.c ECDF for East Asian ancestry.d ECDF for European ancestry.e ECDF for Middle Eastern ancestry.f ECDF for South Asian ancestry.There was no statistically significant difference between ECDFs in negative and positive cases, or negative and inconclusive cases in any of the genetic ancestries.Statistics were performed using Kolmogorov-Smirnov (KS) test.The short vertical lines on the x axis represent cases, ordered by their % genetic ancestries, and colored by outcomes as seen in the legend/key.KS 95% C.I. bands for ECDFs were calculated using Dvoretzky-Kiefer-Wolfowitz (DKW) inequality.

Table 1 .
Distribution of self-identified race/ethnicity of parents of pediatric and prenatal cases in the P 3 EGS cohort.

Parents with 1 Race/Ethnicity Parents with >1 Race/Ethnicity Key:
Correspondence between estimated genetic ancestry proportions and self-reported race/ethnicity of parents of P 3 EGS cases; and estimated genetic ancestry of pediatric and prenatal cases.a Correspondence between race/ethnicity and estimated global genetic ancestry admixture proportions in parents with 1 reported race/ethnicity.b Correspondence between race/ethnicity and estimated global genetic ancestry admixture proportions in parents with more than 1 reported race/ethnicity.c estimated global genetic ancestry admixture proportions in pediatric cases.d estimated global genetic ancestry admixture proportions in prenatal cases.Each horizontal bar in the "Estimated Genetic ancestry column" represents one parent or case, and the "Race/Ethnicity" column corresponds to the self-reported race/ ethnicity of the parent.Genetic ancestry proportions/percentages were estimated from exome sequencing data using Admixture software with unrelated Human Genome Diversity Panel (HGDP) samples from gnomAD as reference samples/populations.

Table 2 .
Number and diagnostic yield (in parentheses) of pediatric and prenatal cases by genetic ancestry bins, stratified by mode of inheritance, and Cochran-Armitage (C-A) test Z-statistics for trend tests of positive versus negative and inconclusive versus negative cases.

Table 2 continued
Number of Cases & (%) in Each Bin

Table 3 .
Mean and standard error (SE) of consanguinity coefficients (F) by Inheritance pattern in 845 P 3 EGS cases.