Abstract
Polygenic scores (PGSs) provide an individual level estimate of genetic risk for any given disease. Since most PGSs have been derived from genome wide association studies (GWASs) conducted in populations of White European ancestry, their validity in other ancestry groups remains unconfirmed. This is especially relevant for cardiometabolic diseases which are known to disproportionately affect people of non-European ancestry. Thus, we aimed to evaluate the performance of PGSs for glycaemic traits (glycated haemoglobin, and type 1 and type 2 diabetes mellitus), cardiometabolic risk factors (body mass index, hypertension, high- and low-density lipoproteins, and total cholesterol and triglycerides) and cardiovascular diseases (including stroke and coronary artery disease) in people of White European, South Asian, and African Caribbean ethnicity in the UK Biobank. Whilst PGSs incorporated some GWAS data from multi-ethnic populations, the vast majority originated from White Europeans. For most outcomes, PGSs derived mostly from European populations had an overall better performance in White Europeans compared to South Asians and African Caribbeans. Thus, multi-ancestry GWAS data are needed to derive ancestry stratified PGSs to tackle health inequalities.
Similar content being viewed by others
Introduction
A polygenic score (PGS) provides a personalised estimate of an individual’s genetic liability to a disease. These are calculated as weighted sums of single nucleotide polymorphisms (SNPs) [1]. Because most existing PGSs have been derived from genome wide association studies (GWASs) conducted in populations of European ancestry [2, 3], their validity in other ancestry groups remains unconfirmed. Therefore, although PGSs are an exciting prospect for precision medicine, they have the potential to perpetuate or widen existing health inequalities if they lead to invalid or misleading inference of disease risk in non-European populations.
In genetic studies, genetic ancestry is commonly used as a proxy for the social construct of ethnicity (and vice versa). However, ethnicity is a complex concept which includes genetic ancestry and a wide range of social constructs (e.g., cultural practices, health beliefs, language, religion, and self-identification) amongst others [4]. In general, genetic ancestry is thought to better reflect genetic relatedness than ethnicity, due to the fact that ethnicity is a broader social concept which incorporates a wide variety of environmental measures such as socioeconomic status and lifestyle [5]. However, there is considerable overlap between genetic ancestry and self-reported ethnicity, although ancestry does not capture the entirety of an individual’s ethnic identity [6]. Thus, self-reported ethnicity is important when examining health disparities related to the wider socio-cultural and environmental determinants of health in addition to biological and genetic factors [5].
PGSs derived in European ancestry populations generally transfer less well to African [7] or South Asian [8] populations. However, studies reporting the transferability to Hispanics have reported conflicting results [9, 10]. Even then, within ancestry heterogeneity can contribute to different predictive powers in ethnic sub-groups. For example, amongst Hispanics, the PGSs can have different performances based on ancestry clusters [11]. Thus, the transethnic transferability of PGSs remains a matter of debate.
Worldwide, there are 500–700 million individuals with diabetes mellitus (DM), 90% of whom have type 2 DM (T2DM) [12]. The prevalence of T2DM differs by age (more common in older people), sex (more common in men) and ethnicity. In the United Kingdom, South Asians are more likely to suffer from diabetes [13], followed by those from an African Caribbean background [14], both of whom have 2–3-fold higher risk of developing T2DM compared to White groups with onset almost a decade earlier [15]. In addition, South Asians and African Caribbeans are more likely to have higher serum glycated haemoblobin A1c (HbA1c) levels even in the absence of diabetes [16] and poorer glycaemic control in established diabetes [17].
In addition to ethnic differences in diabetes risk, cardiovascular diseases (CVDs) also vary across ethnicities. Compared to White Europeans, South Asians are more likely to develop CVD (i.e., coronary artery disease [CAD] and stroke), whilst those from an African Caribbean background are more likely to suffer from stroke [18]. Cardiovascular risk factors generally map to these ethnic differences in CVD outcomes. African Caribbeans generally have healthier lipid profiles (e.g., higher high-density lipoproteins [HDL] and lower total triglycerides [TTG] [19]) compared to White Europeans and South Asians, in whom lipoprotein profiles are most adverse [20]. In contrast, hypertension is more frequent in African Caribbeans than White Europeans [21]. The picture is more complex for South Asians, who have an equivalent or lower blood pressure (BP) than Europeans at younger ages [22], but subsequently experience a steeper BP trajectory resulting in higher later life BP [23].
Whether PGSs derived mostly from White ancestry GWAS data can capture differences by self-reported ethnicity in cardiometabolic traits remains unclear. Using data from the UK Biobank (UKB), this study aimed to explore the prognostic value of transethnic transferability for a wide range of cardiometabolic PGSs and their respective observed outcomes. Our focus was on participants of South Asian and African Caribbean ethnicity in relation to White Europeans as these are the largest ethnic minority groups in the UK and are therefore well represented in UKB.
Methods
Study population
The UKB is a large UK based prospective cohort study with >500,000 participants recruited between 2006 and 2010 when study participants were aged 40–69 years old, and features demographic, genetic, health outcomes and imaging data for its participants [24]. We used the self-reported ethnicity variable which was defined according to the 2001 UK Census guidelines. The breakdown of self-reported ethnicity in UKB is 94.4% White Europeans, 0.2% South Asians, 0.2% African Caribbeans, and 5.2% other/unknown. Ancestry was previously derived in the UK Biobank using principal component analysis (PCA) and clustering, and it shows a good agreement with the self-reported ethnicity [25] we are using in this study.
Polygenic scores
We used the standard and enhanced PGSs derived by Thompson et al., for which methodology has been previously described in detail [26]. Standard UKB PGSs contain only external GWAS data, whilst the enhanced UKB PGSs contain in addition UKB GWAS data. In December 2022, we selected the available standard and enhanced cardiometabolic UKB PGSs namely: (1) type 1 DM (T1DM), (2) T2DM, (3) HbA1c, (4) body mass index (BMI), (4) hypertension, (5) CAD, (6) ischaemic stroke, (7) CVD, (8) HDL, (9) low-density lipoproteins (LDL), (10) total cholesterol, and (11) TTG. For total cholesterol and triglycerides, only an enhanced PGSs was available.
To derive the standard PGSs, Thompson et al. [26] conducted a literature review to identify GWAS summary statistics from external studies. These included: Atherosclerosis Risk in Communities (ARIC); Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE); Electronic Medical Records and Genomics (eMERGE); BioMe BioBank; Jackson Heart Study (JHS); Multi-Ethnic Cohort (MEC); Multi-Ethnic Study of Atherosclerosis (MESA); Omics in Lations (OLA); and GWAS for Breast Cancer in the African Diaspora (ROOT study). To derive the enhanced PGSs, Thompson et al. [26] used a custom Axiom genotyping array (able to assay 825,927 genetic variants) followed by genome-wide imputation. Then, UKB GWAS summary statistics for each trait were obtained using logistic regression for binary outcomes, and linear regression for continuous outcomes, adjusting for age, sex, genotyping chip, and ancestry principal components (PCs). GWAS data were then combined using a Bayesian fixed-effects inverse variance meta-analysis model. UKB and external GWAS data were meta-analysed to yield the enhanced PGSs, whilst external GWAS data without UKB data were combined to obtain the standard PGSs. Both the standard and enhanced PGSs were derived in 70% of the dataset and tested in the remaining 30% to avoid overfitting. Genetic ancestry classification was done using the same methodology which showed a good overlap between self-reported ethnicity and genetic ancestry [25]. The proportion of the genotypes associated with White Europeans, South Asians and African Caribbeans ancestry was determined using a subset of common SNPs from the 1000 Genomes reference dataset, and genetic PCA was conducted to derive the centroid coordinates for ancestry groups, and to further define the ancestry categories [25]. The PGSs were then centred by subtracting out the PGS value predicted from a linear regression of the PGS against the first 4 PCs fitted in the 1000 Genomes Project individuals [27]. Lastly, the centred PGS was divided by the standard deviation (SD) in the corresponding ancestry group. The focus of our work are the enhanced PGSs as these have been shown by Thompson et al. [26] to have a higher predictive performance.
Cardiometabolic outcomes
All outcomes were evaluated using information captured at the baseline assessment between 2006 and 2010 in the 22 recruitment centres across England, Scotland, and Wales. These included the presence of T1DM (yes/no), T2DM (yes/no), HbA1c (mmol/mol), BMI (kg/m2), hypertension (yes/no), CAD (yes/no), stroke (yes/no), CVD (yes/no), HDL (mmol/l), LDL (mmol/l), total cholesterol (mmol/l) and TTG (mmol/l). T1DM and T2DM were defined using an algorithm which was validated against primary care records, taking into account the self-report, doctor diagnosis and the use of diabetes medications [28]. BMI (kg/m2) was calculated as the ratio of weight to height2. The presence of hypertension at baseline was defined as either: (1) self-report of anti-hypertensive medication use, (2) systolic BP > 140 mmHg or (3) diastolic BP > 80 mmHg. The presence of CAD, stroke and CVD (i.e., CAD + stroke) were based on the baseline self-report, nursing interview and linked inpatient hospital data as previously described [29]. We did not use incident data as there are known healthcare access disparities among ethnic groups in the UK which could introduce bias [30]. HbA1c, HDL, LDL, total cholesterol and TTG were quantified from the baseline blood samples [25].
Covariates
Sex was self-reported as male or female, and age (years) was recorded at the time of recruitment. Area based Townsend deprivation scores were used to capture socio-economic position (SEP) [31]. The primary care survey provided data on the prescribed medications of each study participant.
Statistical analysis
All analyses were performed in R 4.2.1 [32]. Data distributions were assessed using histograms. Continuous variables were expressed as mean ± 1 SD or median (interquartile range) as appropriate; categorical variables were expressed as counts and percentages.
Participants were categorised based on self-reported ethnicity as White European, South Asian, and African Caribbean. Individuals of mixed, other, and unknown ethnicity were not included due to small sample sizes. All analyses were conducted within each ethnic group. We used the PGSs as the indipendent variables to test their association with their corresponding cardiometabolic outcomes. For binary outcomes, generalised linear models (glms) with binominal distribution (i.e., logistic regression) were employed. The continuous outcomes were either slightly or heavily skewed. For example, BMI had a skewness greater than 1, while HbA1c and TTG a skewness exceeding 2. As the gamma distribution can flexibly accommodate positively skewed data due to its shape and scale parameters, we used glms with gamma distribution and identity link for our continuous outcomes.
Two regression models were compared. Model 1 was unadjusted to obtain raw estimates. For all outcomes, model 2 was adjusted for age, sex, and SEP in order to obtain more accurate and precise regression estimates. As adjustment for genetic PCs was previously used to control for ancestry during the PGS derivation process, further adjustment for PCs was not pursued. Since this study did not attempt to explore mechanistic pathways downstream of the genotype but upstream of the phenotypes, further models with adjustment for mediators were not pursued. Model assumptions were verified with regression diagnostics and found to be satisfied. Results were then corrected for multiple testing using a false discovery rate of 0.05 [33].
Using a 30% testing dataset, the classification performance (i.e., predicting the binary outcome) of both logistic regression models were evaluated using the receiver operating characteristic (ROC) curve. The area under the curve (AUC) and its associated 95% confidence interval (CI) was derived for each ethnicity for each binary outcome. The ROC AUCs were compared between ethnicities using DeLong’s test. For continuous outcomes, we compared the effect sizes derived from the glms which capture the increase in outcome per unit increase in the PGSs. Since the PGSs underwent a PC-based ancestry centring and had a normal distribution with a similar SD of approximatively 1 (Table 1), the effect sizes was not further standardised.
Sensitivity analyses
As a sensitivity analysis, model 2 was additionally adjusted for diabetes medications when HbA1c was the outcome and for lipid-lowering drugs when exploring HDL, LDL, total cholesterol and TTG as outcomes. PGSs are upstream of the cardiometabolic outcomes which are upstream of the medications (i.e., a causal chain). In instances where the medications can then affect back the cardiometabolic outcome (e.g., diabetes medications lowering HbA1c), adjusting for them allows the estimation of the direct association between the PGS and the cardiometabolic outcome. This adjustment esentially controls for unmeasured confounders downstream of the medication (e.g., access to healthcare, healthcare seeking behaviour etc.).
In addition, we also calculated the area under the precision-recall curve (PR-AUC) as the ROC AUC can be misleading when the outcomes are rare.
Results
In this study we included 472,036 participants their characteristics and standard PGSs stratified by ethnicity are presented in Table 1, while their enhanced PGSs are presented in Supplementary Table S1. On average, both South Asians (53.4 years) and African Caribbeans (51.9 years) were younger than White Europeans (56.8 years) at the time of outcome assessment. Men comprised 45.5% of White, 54.5% of South Asians and 42.3% of African Caribbeans. There were 45.7% South Asians, 70.5% African Caribbeans, and 23.2% White Europeans in the lowest quartile of the Townsend deprivation index. South Asians had the highest prevalence of T2DM (16.7%), CVD (10.1%), and CAD (7.4%), whilst African Caribbeans had the highest average BMI (29.5) and the highest prevalence of hypertension (72.6%). Despite the PC-based ancestry centring, the PGSs experienced small residual deviations from absolute zero in South Asians and African Caribbeans (Table 1). Model 1 and model 2 results for the standard and enhanced PGSs are presented in Table 2. Results from model 2 for the enhanced PGSs are presented below.
Type 1 diabetes
The association between the enhanced PGSs and T1DM was strongest for White Europeans (odds ratio [OR] 3.09 95% CI [2.72, 3.40]) followed by South Asians (OR 1.52 95% CI [1.11, 2.07]) and African Caribbeans (OR 1.40 95% CI [0.99, 1.95]) (Fig. 1A). The PGS’ predictive performance was highest in White Europeans (AUC 0.84 95% CI [0.80, 0.89]) followed by South Asians AUC 0.63 95% CI [0.49, 0.77] and African Caribbeans (AUC 0.50 95% CI [0.32,0.68]) (Table 3).
Type 2 diabetes
According to the OR, the performance was highest in White Europeans (OR 2.48 95% CI [2.39, 2.58]) followed by South Asians (OR 2.05 95% CI [1.91, 2.20]) and African Caribbeans (OR 1.51 95% CI 1.51 [1.30, 1.48]) (Fig. 1B). According to the AUC, the enhanced PGS’ predictive performance was higher in White Europeans (AUC 0.80 95% CI [0.79, 0.82]) compared to South Asians (AUC 0.76 95% CI [0.73, 0.78]) and African Caribbeans (AUC 0.73 95% CI [0.69, 0.76]) (Table 3).
HbA1c
The regression coefficient (β) was higher in White Europeans and South Asians compared to African Caribbeans. One unit (or 1 SD) increase in the enhanced PGS was associated with a 1.69 mmol/mol 95% CI (1.65, 1.73) higher HbA1c in White Europeans, 1.79 95% CI (1.57, 2.00) in South Asians and 1.03 (0.81, 1.26) in African Caribbeans after adjusting for sex, age, SEP (Table 2). The difference between White Europeans and South Asians was not statistically significant (p = 0.370). Results are visually depicted in Fig. 1C.
BMI
A unit increase in the enhanced PGS resulted in a 1.71 kg/m2 95% CI (1.68, 1.74) increase in BMI in White Europeans, 1.31 95% CI (1.22, 1.40) in South Asians and 0.90 95% CI (0.80, 1.00) in African Caribbeans (Table 2 and Fig. 1D).
Hypertension
There was no difference in performance by ethnicity according to the ROC curve analysis (Table 3). The ORs were similar across all ethnicities using both standard (≈1.50) and enhanced PGSs (≈1.70). (Table 2 and Fig. 2C).
CVD and CAD
For CVD, the performance of the enhanced PGS was higher in White Europeans (AUC 0.77 95% CI [0.76, 0.78]) and South Asians (AUC 0.74 95% CI [0.70, 0.77]) vs African Caribbeans (AUC 0.66 95% CI [0.61, 0.72]) all p < 0.025 (Table 3). Similarly, the ORs were higher in White Europeans (OR 1.61 95% CI [1.56, 1.66]) and South Asians (OR 1.58 95% CI [1.45, 1.71]) compared to African Caribbeans (OR 1.20 95% CI [1.09, 1.32]) (Fig. 2A).
For CAD, the results were similar to those reported above for CVD, with a higher predictive performance according to the ROC curve analysis (Table 3) and higher ORs (Table 2) in White Europeans and South Asians compared to African Caribbeans.
Stroke
The performance of the enhanced PGS according to the ROC curve analysis (AUC ≈ 0.70) and the ORs (1.20–1.40) were similar across all ethnicities (Fig. 2D, Table 3).
HDL and LDL
One SD increase in the enhanced HDL PGS resulted in a 0.135 mmol/l 95% CI (0.133,0.137) greater HDL in White Europeans, 0.107 95% CI (0.101, 0.113) in South Asians and 0.089 95% CI (0.082, 0.097) in African Caribbeans (Table 2 and Fig. 3A, B) after adjusting for sex, age, SEP. A unit increase in the enhanced LDL PGS was associated with a higher LDL in White Europeans (0.267 mmol/l 95% CI [0.261, 0.272]) followed by African Caribbeans (0.216 95% CI [0.199, 0.233]) and South Asians (0.169 95% CI [0.149, 0.188]).
Total cholesterol and triglycerides
A unit (or 1 SD) increase in the enhanced PGS resulted in a greater total cholesterol in White Europeans (0.278 mmol/l 95% CI [0.269,0.286]) compared to African Carribeans (0.202 95% CI [0.179, 0.226]) and South Aasians (0.183 95% CI [0.157, 0.208]) (Fig. 3C). On the other hand, a unit (or 1 SD) increase in the TTG enhanced PGS was associated with a higher TTG in South Asians (0.278 mmol/l 95% CI [0.257, 0.299]) compared to White Europeans (0.228 95% CI [0.222, 0.234]) and African Caribbeans (0.086 95% CI [0.071, 0.101]) (Fig. 3D).
Sensitivity analyses
The regression results stratified per ethnicity for HbA1c, HDL, LDL, total cholesterol, and triglycerides with further adjustments for diabetes medications or lipid-lowering drugs as appropriate are presented in Supplementary Table S2. In general, the findings were replicated, but the effect sizes were slightly smaller for HbA1c and slightly larger for the lipid outcomes.
The PR-AUC results stratified by ethnicity are presented in Supplementary Table S3. The PR-AUC was larger in White Europeans followed by South Asians and African Caribbeans for T1DM, CVD and CAD. However, the estimates were similar for hypertension and stroke. The PR-AUC was greater in South Asians compared to White Europeans for T2DM.
Discussion
In this study we evaluated the performance of standard and enhanced UKB cardiometabolic PGSs derived mostly in White European populations in association with their respective observed phenotype by ethnicity. Whilst the UKB PGSs included some data from multi-ethnic GWAS studies, the performance of both the standard and enhanced PGSs was better in White Europeans compared to South Asians and African Caribbeans for most cardiometabolic outcomes. This can be explained by the predominance of White European GWAS data when deriving the PGSs.
Factors driving poorer PGS performance in ethnic diverse populations
According to the National Human Genome Research Institute and European Bioinformatics Institute GWAS catalogue almost 80% of the GWAS studies were performed in White Europeans which represent roughly 10% of the global population [34]. In contrast, 25% of the global population is of South Asian and 15% of African Caribbean ancestry. Thus, GWAS data is scarce in non-White ancestries. This has multiple downstream implications and might partly explain the worse performance of the PGSs in multi-ethnic populations [35]. Firstly, linkage disequilibrium (LD) varies across ancestries which may drive differences in effect size estimates in GWASs [36]. Secondly, imputation reference panels which are widely used to address bias in GWASs are less efficient in non-White ancestries due to data scarcity. Thirdly, within-ethnicity ancestry subcategories in non-White population are less studied. This is important because within-ethnicity heterogeneity leading to differential predictive power of PGSs in the same ethnicity has been reported [11]. Fourthly, the normal reference ranges for quantitative biomarkers may vary between ethnicities [37]. Without ethnicity specific cut-offs, there is an inherent bias in any GWAS which categorises/binarizes quantitative traits. Lastly, studies may be reporting common benign variants as pathologic in other ethnicities just because they are rare in White Europeans [2]. Thus, large ethnic diverse datasets and improved treatment of LD and variant frequencies are increasingly needed to create equitable PGSs before widespread clinical use [35].
Ethnic inclusivity for equitable implementation of polygenic scores
In CVD research, the vast majority of cohort studies enroled mostly people of White European ancestry. There are only a few studies which include genetic data in ethnic minorities. These either focus on a single ethnic group (e.g., East London Genes & Health [ELGH], China Kadoorie Biobank [CKB], Mexico City Prospective Study [MCPS], New Delhi Birth Cohort Study, OLA etc.) or multiple ethnic groups (e.g., Age, Gene/Environment Susceptibility-Reykjavik Study [AGES-Reykjavik], ARIC, Born in Bradford (BiB), Cardiovascular Health Study [CHS], Dallas Heart Study [DHS], Framingham Heart Study [FHS] OMNI cohorts, JHS, MEC, MESA, Rotterdam Study [RS], Southall and Brent Revisited [SABRE] etc.). Importantly, there is a tendency to aggregate individual cohorts into consortia (e.g., genetic data from AGES, ARIC, CHS, FHS and RS cohorts are available through the Cohorts for Heart and Aging Research in Genomic Epidemiology [CHARGE] consortium). Despite these collections, the percentage of non-White European ancestry participants in GWASs has not increased in recent years [34]. This suggests that the reduced performance of PGSs in ethnic minorities is unlikely to improve in the near future.
In developed nations, the low participation of ethnic minorities in biomedical research is multi-factorial but mainly related to reduced trust given past research misconduct and feelings of racial discrimination [38]. However, movements such as the All of Us Research Program from National Institute of Health are working towards having a culturally aware approach to engage under-represented ethnic minorities in research [39].
Ancestry inclusivity for equitable implementation of polygenic scores
Race and ethnicity are socio-cultural constructs, whilst ancestry refers to the genetic origin of a population. Engaging under-represented ethnic and ancestry minorities in genomics research should be a global research priority. Indeed, there are movements aiming to address these disparities such as the Human Heredity and Health in Africa initiative [40]. However, lack of funding remains the main limitation of such international movements [41].
Polygenic scores and health inequalities during translation to practice
The advent of genetic data in large cohort datasets such as the UK Biobank has led to the discovery of multiple SNPs which are associated with a variety of cardiometabolic diseases using GWASs. Whilst the added value of PGSs on top of already validated clinical tools is yet to be fully elucidated, current studies suggest that PGSs could: (1) increase disease prediction in early life, (2) help guide population-wide screening and preventative targeted interventions (e.g., lipid lowering drugs in those with a high PGS for total cholesterol and LDL), (3) help promote favourable health behaviours in those with an enhanced risk, (4) improve diagnostic accuracy (e.g., differentiating T1DM vs T2DM in overweight antibody-negative young individuals), and (5) predicting response to treatments [27]. Given the worse performance of PGSs in ethnic minorities, they may miss out on benefiting from improved health outcomes. The deployment of PGSs would benefit the population group which is already privileged in terms of health outcomes further deepening existing healthcare inequalities. Thus, large-scale multi-ancestry GWAS data are urgently needed to generate ethnicity stratified PGSs to tackle health inequalities.
Limitations
Limitations of the UK Biobank PGSs have been previously discussed [26]. With regards to PGS evaluation, the main limitation of our study relates to the lack of widely accepted performance metrics [42]. Whilst phenotypic variance explained (R2) and association p-values have been previously proposed [43], we used effect-size metrics for the outcome as these are widely used for established traditional risk factors. However, these do not accurately capture disease prevalence in the general population.
In addition, the use of the self-reported ethnicity is not in line with the latest recommendations on the use and reporting of race and ancestry in genetic research, which instead recommends the incorporation of ancestry informative markers for a more precise characterisation of an individual’s identity [5, 44]. Importantly, whilst there is a good overlap between genetic ancestry and self-reported ethnicity, these are not identical in the UK Biobank [25]. Nonetheless, self-reported ethnicity is also able to capture social constructs (e.g., health beliefs) which could drive the observed differences in PGSs’ performance. In addition, detailed ethnic groupings (defined according to the 16 categories of the 2001 census) had to be collapsed into the 5 high level categories of the census to increase the statistical power of the analyses conducted in multi-ethnic populations.
As underlying differences in allele frequencies and LD likely make a major contribution to the ancestry performance differences in non-ancestry matched PGSs, the scores could be improved through the use of the appropriate ancestral reference LD. However, we did not attempt to improve the PGSs in this study as our aim was solely to evaluate the ones derived by Thompson et al. [26]. While our findings that PGSs perform better in White Europeans may not be unsurprising, there is a need to provide empirical evidence, as without this, the proposed PGSs will be used without consideration of the ethnic background. While Thompson et al. [26] provided the effect sizes for the associations between PGSs with their corresponding outcomes stratified by ethnicity, we were able to build on this work by using more sophisticated statistical approaches (e.g., ROC and precision recall curves) to better evaluate the performance of the PGSs in multi-ethnic populations. Our work is important because it highlights that these PGSs released by UK Biobank need to be used with consideration in multi-ethnic populations and underscores the need for improving them. In addition, we discuss the factors driving poorer PGS performance in ethnically diverse populations, and the need for ethnic inclusivity for the equitable implementation of PGSs to reduce health inequalities as they transition to clinical practice.
Although the ethnic minorities were 3–4 years younger at recruitment, incident cardiometabolic diseases have a younger age of onset (e.g., diabetes in South Asians [45] and hypertension in those of African ancestry [46] occur ≈10-years earlier compared to White Europeans). In addition, SABRE data showed that a lower proportion of UK White individuals were diagnosed with diabetes in the National Health Service compared to South Asians and African Caribbeans when comparing the study blood test results with the healthcare records [47]. As a higher proportion in the ethnic minority groups should have already been diagnosed with the cardiometabolic diseases, it would be expected that PGSs would actually perform better in the ethnic minorities. Thus, another important limitation is that we might actually underestimate the performance difference of the PGSs between White Europeans and the multi-ethnic populations.
Conclusion
In general, UK Biobank standard and enhanced PGSs had markedly better performance in White Europeans compared to South Asians and African Caribbeans when evaluating cardiometabolic phenotypes. More GWAS data in ethnic minorities is required to improve the performance of the PGSs to avoid perpetuating health inequalities especially since cardiometabolic diseases are more prevalent in South Asians and African Caribbeans.
Data availability
The UK Biobank data is available via an application from https://www.ukbiobank.ac.uk/. This work was conducted under application ID 7661.
References
Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12:44.
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91.
Clarke SL, Assimes TL, Tcheandjieu C. The propagation of racial disparities in cardiovascular genomics research. Circ Genom Precis Med. 2021;14:e003178.
Mathur R, Rentsch CT, Venkataraman K, Fatumo S, Jobe M, Angkurawaranon C, et al. How do we collect good-quality data on race and ethnicity and address the trust gap? Lancet. 2022;400:2028–30.
Mersha TB, Abebe T. Self-reported race/ethnicity in the age of genomic research: its potential impact on understanding health disparities. Hum Genom. 2015;9:1.
Frank R. The molecular reinscription of race: a comment on “genetic bio-ancestry and social construction of racial classification in social surveys in the contemporary United States”. Demography. 2014;51:2333–6.
Kamiza AB, Toure SM, Vujkovic M, Machipisa T, Soremekun OS, Kintu C, et al. Transferability of genetic risk scores in African populations. Nat Med. 2022;28:1163–6.
Hodgson S, Huang QQ, Sallah N, Griffiths CJ, Newman WG, Trembath RC, et al. Integrating polygenic risk scores in the prediction of type 2 diabetes risk and subtypes in British Pakistanis and Bangladeshis: a population-based cohort study. PLOS Med. 2022;19:e1003981.
Dikilitas O, Schaid DJ, Kosel ML, Carroll RJ, Chute CG, Denny JC, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am J Hum Genet. 2020;106:707–16.
Grinde KE, Qi Q, Thornton TA, Liu S, Shadyab AH, Chan KHK, et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet Epidemiol. 2019;43:50–62.
Clarke SL, Huang RDL, Hilliard AT, Tcheandjieu C, Lynch J, Damrauer SM, et al. Race and ethnicity stratification for polygenic risk score analyses may mask disparities in hispanics. Circulation. 2022;146:265–7.
American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes care. 2012;35:S64-S71.
Narayan KMV, Kanaya AM. Why are South Asians prone to type 2 diabetes? A hypothesis based on underexplored pathways. Diabetologia. 2020;63:1103–9.
Pham TM, Carpenter JR, Morris TP, Sharma M, Petersen I. Ethnic differences in the prevalence of Type 2 diabetes diagnoses in the UK: cross-sectional analysis of the health improvement network primary care database. Clinical Epidemiology. 2019;11:1081–8.
Mathur R, Palla L, Farmer RE, Chaturvedi N, Smeeth L. Ethnic differences in the severity and clinical management of type 2 diabetes at time of diagnosis: a cohort study in the UK Clinical Practice Research Datalink. Diabetes Res Clin Pract. 2020;160:108006.
Khanolkar AR, Amin R, Taylor-Robinson D, Viner RM, Warner J, Gevers EF, et al. Ethnic differences in early glycemic control in childhood-onset type 1 diabetes. BMJ Open Diabetes Res Care. 2017;5:e000423.
Wolffenbuttel BHR, Herman WH, Gross JL, Dharmalingam M, Jiang HH, Hardin DS. Ethnic differences in glycemic markers in patients with type 2 diabetes. Diabetes Care. 2013;36:2931–6.
Roth GAMDMPH, Abajobir AMPH, Abera SFM, Aksut BMD, Alam TMPH, Alam KP, et al. Global, regional, and national burden of cardiovascular diseases for 10 Causes, 1990 to 2015. J Am Coll Cardiol. 2017;70:1–25.
Bentley AR, Rotimi CN. Interethnic differences in serum lipids and implications for cardiometabolic disease risk in african ancestry populations. Glob Heart. 2017;12:141.
Bilen O, Kamal A, Virani SS. Lipoprotein abnormalities in South Asians and its association with cardiovascular disease: current state and future directions. World J Cardiol. 2016;8:247–57.
Modesti PA, Reboldi G, Cappuccio FP, Agyemang C, Remuzzi G, Rapi S, et al. Panethnic differences in blood pressure in Europe: a systematic review and meta-analysis. PLOS ONE. 2016;11:e0147601.
Sproston K, Mindell J. Health Survey for England 2004: The Health of Minority Ethnic Groups– headline tables. 2006. https://digital.nhs.uk/data-and-information/publications/statistical/health-survey-for-england/health-survey-for-england-2004-health-of-ethnic-minorities-headline-results.
Agyemang C, Bhopal RS. Is the blood pressure of South Asian adults in the UK higher or lower than that in European white adults? A review of cross-sectional data. J Hum Hypertens. 2002;16:739–51.
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
Thompson DJ, Wells D, Selzam S, Peneva I, Moore R, Sharp K, et al. UK Biobank release and systematic evaluation of optimised polygenic risk scores for 53 diseases and quantitative traits. (2022).
Adeyemo A, Balaconis MK, Darnes DR, Fatumo S, Granados Moreno P, Hodonsky CJ, et al. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat Med. 2021;27:1876–84.
Eastwood SV, Mathur R, Atkinson M, Brophy S, Sudlow C, Flaig R, et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLOS ONE. 2016;11:e0162388.
Garfield V, Farmaki AE, Eastwood SV, Mathur R, Rentsch CT, Bhaskaran K, et al. HbA1c and brain health across the entire glycaemic spectrum. Diabetes Obes Metab. 2021;23:1140–9.
Bleich SN, Jarlenski MP, Bell CN, LaVeist TA. Health inequalities: trends, progress, and policy. Annu Rev Public Health. 2012;33:7–40.
Townsend P, Beattie A, Phillimore P. Health and deprivation : inequality and the north / Peter Townsend, Peter Phillimore and Alastair Beattie. London: Routledge; 1989.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. 2022. https://www.R-project.org/.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc: Ser B. 1995;57:289–300.
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:D896–901.
Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun. 2019;10:3328.
Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, et al. Generalization and dilution of association results from European GWAS in populations of Non-European ancestry: the PAGE study. PLoS Biol. 2013;11:e1001661.
Rappoport N, Paik H, Oskotsky B, Tor R, Ziv E, Zaitlen N, et al. Comparing ethnicity-specific reference intervals for clinical laboratory tests from EHR data. J Appl Lab Med. 2018;3:366–77.
Suther S, Kiros G-E. Barriers to the use of genetic testing: a study of racial and ethnic disparities. Genet Med. 2009;11:655–62.
Mapes BM, Foster CS, Kusnoor SV, Epelbaum MI, Auyoung M, Jenkins G, et al. Diversity and inclusion for the all of Us research program: a scoping review. PLOS ONE. 2020;15:e0234962.
Rotimi C. Enabling the genomic revolution in Africa. Science. 2014;344:1346–8.
Yeh C, Meng C, Wang S, Driscoll A, Rozi E, Liu P, et al. SustainBench: benchmarks for monitoring the sustainable development goals with machine learning. 2021. https://arxiv.org/abs/2111.04724.
Baker SG. Metrics for Evaluating Polygenic Risk Scores. JNCI Cancer Spectr. 2021;5:pkaa106.
Choi SW, Mak TS-H, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15:2759–72.
Flanagin A, Frey T, Christiansen SL. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA. 2021;326:621.
Paul SK, Owusu Adjah ES, Samanta M, Patel K, Bellary S, Hanif W, et al. Comparison of body mass index at diagnosis of diabetes in a multi‐ethnic population: a case‐control study with matched non‐diabetic controls. Diabetes Obes Metab. 2017;19:1014–23.
Lackland DT. Racial differences in hypertension: implications for high blood pressure management. Am J Med Sci. 2014;348:135–8.
Tillin T, Forouhi NG, McKeigue PM, Chaturvedi N. Southall and Brent REvisited: cohort profile of SABRE, a UK population-based comparison of cardiovascular disease and diabetes in people of European, Indian Asian and African Caribbean origins. Int J Epidemiol. 2012;41:33–42.
Acknowledgements
The authors would like to thank all the UK Biobank members for their participation and continuous engagement with follow-up and all UK Biobank scientific and data collection teams.
Author contributors
All authors were involved in study design and implementation, data analysis and interpretation, critically reviewing and revising the manuscript. In addition, all authors approved the final version as submitted and agree to be accountable for all aspects of the work.
Funding
NC received support from the UK Medical Research Council, Diabetes UK, Wellcome Trust, British Heart Foundation and National Institute for Health Research University College London Hospitals Biomedical Research Centre. RM is supported by Barts Charity (MGU0504). VG is funded by the Professor David Matthews Non-Clinical Fellowship (ref: SCA/01/NCF/22). VG is also supported by joint Diabetes UK and British Heart Foundation grant (ref: 15/0005250). Role of The Funding Source: None of the funders was involved in the study design, the collection, the analysis, the interpretation of the data, and in the decision to submit the article for publication. For the purpose of open access, the authors have applied a creative commons attribution (CC BY) license to any author accepted manuscript version arising.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The views expressed in this article are those of the authors who declare that they have no conflict of interest except for NC who serves on a Data Safety and Monitoring Board for a clinical trial of a glucose lowering agent, funded by AstraZeneca. RM is part of the Genes & Health programme, which is part-funded (including salary contributions) by a Life Sciences Consortium comprising Astra Zeneca PLC, Bristol-Myers Squibb Company, GlaxoSmithKline Research and Development Limited, Maze Therapeutics Inc, Merck Sharp & Dohme LLC, Novo Nordisk A/S, Pfizer Inc, Takeda Development Centre Americas Inc.
Ethical approval
UK Biobank’s ethical approval (11/NW/0382) was from the Northwest Multi-centre Research Committee (MRCEC) in 2011, which was renewed in 2016 and then in 2021. All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Topriceanu, CC., Chaturvedi, N., Mathur, R. et al. Validity of European-centric cardiometabolic polygenic scores in multi-ancestry populations. Eur J Hum Genet (2024). https://doi.org/10.1038/s41431-023-01517-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41431-023-01517-3