Validity of European-centric cardiometabolic polygenic scores in multi-ancestry populations

Topriceanu, Constantin-Cristian; Chaturvedi, Nish; Mathur, Rohini; Garfield, Victoria

doi:10.1038/s41431-023-01517-3

Download PDF

Article
Open access
Published: 05 January 2024

Validity of European-centric cardiometabolic polygenic scores in multi-ancestry populations

European Journal of Human Genetics (2024)Cite this article

1329 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

Polygenic scores (PGSs) provide an individual level estimate of genetic risk for any given disease. Since most PGSs have been derived from genome wide association studies (GWASs) conducted in populations of White European ancestry, their validity in other ancestry groups remains unconfirmed. This is especially relevant for cardiometabolic diseases which are known to disproportionately affect people of non-European ancestry. Thus, we aimed to evaluate the performance of PGSs for glycaemic traits (glycated haemoglobin, and type 1 and type 2 diabetes mellitus), cardiometabolic risk factors (body mass index, hypertension, high- and low-density lipoproteins, and total cholesterol and triglycerides) and cardiovascular diseases (including stroke and coronary artery disease) in people of White European, South Asian, and African Caribbean ethnicity in the UK Biobank. Whilst PGSs incorporated some GWAS data from multi-ethnic populations, the vast majority originated from White Europeans. For most outcomes, PGSs derived mostly from European populations had an overall better performance in White Europeans compared to South Asians and African Caribbeans. Thus, multi-ancestry GWAS data are needed to derive ancestry stratified PGSs to tackle health inequalities.

Genome-wide association studies

Article 26 August 2021

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

Introduction

A polygenic score (PGS) provides a personalised estimate of an individual’s genetic liability to a disease. These are calculated as weighted sums of single nucleotide polymorphisms (SNPs) [1]. Because most existing PGSs have been derived from genome wide association studies (GWASs) conducted in populations of European ancestry [2, 3], their validity in other ancestry groups remains unconfirmed. Therefore, although PGSs are an exciting prospect for precision medicine, they have the potential to perpetuate or widen existing health inequalities if they lead to invalid or misleading inference of disease risk in non-European populations.

In genetic studies, genetic ancestry is commonly used as a proxy for the social construct of ethnicity (and vice versa). However, ethnicity is a complex concept which includes genetic ancestry and a wide range of social constructs (e.g., cultural practices, health beliefs, language, religion, and self-identification) amongst others [4]. In general, genetic ancestry is thought to better reflect genetic relatedness than ethnicity, due to the fact that ethnicity is a broader social concept which incorporates a wide variety of environmental measures such as socioeconomic status and lifestyle [5]. However, there is considerable overlap between genetic ancestry and self-reported ethnicity, although ancestry does not capture the entirety of an individual’s ethnic identity [6]. Thus, self-reported ethnicity is important when examining health disparities related to the wider socio-cultural and environmental determinants of health in addition to biological and genetic factors [5].

PGSs derived in European ancestry populations generally transfer less well to African [7] or South Asian [8] populations. However, studies reporting the transferability to Hispanics have reported conflicting results [9, 10]. Even then, within ancestry heterogeneity can contribute to different predictive powers in ethnic sub-groups. For example, amongst Hispanics, the PGSs can have different performances based on ancestry clusters [11]. Thus, the transethnic transferability of PGSs remains a matter of debate.

Worldwide, there are 500–700 million individuals with diabetes mellitus (DM), 90% of whom have type 2 DM (T2DM) [12]. The prevalence of T2DM differs by age (more common in older people), sex (more common in men) and ethnicity. In the United Kingdom, South Asians are more likely to suffer from diabetes [13], followed by those from an African Caribbean background [14], both of whom have 2–3-fold higher risk of developing T2DM compared to White groups with onset almost a decade earlier [15]. In addition, South Asians and African Caribbeans are more likely to have higher serum glycated haemoblobin A1c (HbA1c) levels even in the absence of diabetes [16] and poorer glycaemic control in established diabetes [17].

In addition to ethnic differences in diabetes risk, cardiovascular diseases (CVDs) also vary across ethnicities. Compared to White Europeans, South Asians are more likely to develop CVD (i.e., coronary artery disease [CAD] and stroke), whilst those from an African Caribbean background are more likely to suffer from stroke [18]. Cardiovascular risk factors generally map to these ethnic differences in CVD outcomes. African Caribbeans generally have healthier lipid profiles (e.g., higher high-density lipoproteins [HDL] and lower total triglycerides [TTG] [19]) compared to White Europeans and South Asians, in whom lipoprotein profiles are most adverse [20]. In contrast, hypertension is more frequent in African Caribbeans than White Europeans [21]. The picture is more complex for South Asians, who have an equivalent or lower blood pressure (BP) than Europeans at younger ages [22], but subsequently experience a steeper BP trajectory resulting in higher later life BP [23].

Whether PGSs derived mostly from White ancestry GWAS data can capture differences by self-reported ethnicity in cardiometabolic traits remains unclear. Using data from the UK Biobank (UKB), this study aimed to explore the prognostic value of transethnic transferability for a wide range of cardiometabolic PGSs and their respective observed outcomes. Our focus was on participants of South Asian and African Caribbean ethnicity in relation to White Europeans as these are the largest ethnic minority groups in the UK and are therefore well represented in UKB.

Methods

Study population

The UKB is a large UK based prospective cohort study with >500,000 participants recruited between 2006 and 2010 when study participants were aged 40–69 years old, and features demographic, genetic, health outcomes and imaging data for its participants [24]. We used the self-reported ethnicity variable which was defined according to the 2001 UK Census guidelines. The breakdown of self-reported ethnicity in UKB is 94.4% White Europeans, 0.2% South Asians, 0.2% African Caribbeans, and 5.2% other/unknown. Ancestry was previously derived in the UK Biobank using principal component analysis (PCA) and clustering, and it shows a good agreement with the self-reported ethnicity [25] we are using in this study.

Polygenic scores

We used the standard and enhanced PGSs derived by Thompson et al., for which methodology has been previously described in detail [26]. Standard UKB PGSs contain only external GWAS data, whilst the enhanced UKB PGSs contain in addition UKB GWAS data. In December 2022, we selected the available standard and enhanced cardiometabolic UKB PGSs namely: (1) type 1 DM (T1DM), (2) T2DM, (3) HbA1c, (4) body mass index (BMI), (4) hypertension, (5) CAD, (6) ischaemic stroke, (7) CVD, (8) HDL, (9) low-density lipoproteins (LDL), (10) total cholesterol, and (11) TTG. For total cholesterol and triglycerides, only an enhanced PGSs was available.

To derive the standard PGSs, Thompson et al. [26] conducted a literature review to identify GWAS summary statistics from external studies. These included: Atherosclerosis Risk in Communities (ARIC); Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE); Electronic Medical Records and Genomics (eMERGE); BioMe BioBank; Jackson Heart Study (JHS); Multi-Ethnic Cohort (MEC); Multi-Ethnic Study of Atherosclerosis (MESA); Omics in Lations (OLA); and GWAS for Breast Cancer in the African Diaspora (ROOT study). To derive the enhanced PGSs, Thompson et al. [26] used a custom Axiom genotyping array (able to assay 825,927 genetic variants) followed by genome-wide imputation. Then, UKB GWAS summary statistics for each trait were obtained using logistic regression for binary outcomes, and linear regression for continuous outcomes, adjusting for age, sex, genotyping chip, and ancestry principal components (PCs). GWAS data were then combined using a Bayesian fixed-effects inverse variance meta-analysis model. UKB and external GWAS data were meta-analysed to yield the enhanced PGSs, whilst external GWAS data without UKB data were combined to obtain the standard PGSs. Both the standard and enhanced PGSs were derived in 70% of the dataset and tested in the remaining 30% to avoid overfitting. Genetic ancestry classification was done using the same methodology which showed a good overlap between self-reported ethnicity and genetic ancestry [25]. The proportion of the genotypes associated with White Europeans, South Asians and African Caribbeans ancestry was determined using a subset of common SNPs from the 1000 Genomes reference dataset, and genetic PCA was conducted to derive the centroid coordinates for ancestry groups, and to further define the ancestry categories [25]. The PGSs were then centred by subtracting out the PGS value predicted from a linear regression of the PGS against the first 4 PCs fitted in the 1000 Genomes Project individuals [27]. Lastly, the centred PGS was divided by the standard deviation (SD) in the corresponding ancestry group. The focus of our work are the enhanced PGSs as these have been shown by Thompson et al. [26] to have a higher predictive performance.

Cardiometabolic outcomes

All outcomes were evaluated using information captured at the baseline assessment between 2006 and 2010 in the 22 recruitment centres across England, Scotland, and Wales. These included the presence of T1DM (yes/no), T2DM (yes/no), HbA1c (mmol/mol), BMI (kg/m²), hypertension (yes/no), CAD (yes/no), stroke (yes/no), CVD (yes/no), HDL (mmol/l), LDL (mmol/l), total cholesterol (mmol/l) and TTG (mmol/l). T1DM and T2DM were defined using an algorithm which was validated against primary care records, taking into account the self-report, doctor diagnosis and the use of diabetes medications [28]. BMI (kg/m²) was calculated as the ratio of weight to height². The presence of hypertension at baseline was defined as either: (1) self-report of anti-hypertensive medication use, (2) systolic BP > 140 mmHg or (3) diastolic BP > 80 mmHg. The presence of CAD, stroke and CVD (i.e., CAD + stroke) were based on the baseline self-report, nursing interview and linked inpatient hospital data as previously described [29]. We did not use incident data as there are known healthcare access disparities among ethnic groups in the UK which could introduce bias [30]. HbA1c, HDL, LDL, total cholesterol and TTG were quantified from the baseline blood samples [25].

Covariates

Sex was self-reported as male or female, and age (years) was recorded at the time of recruitment. Area based Townsend deprivation scores were used to capture socio-economic position (SEP) [31]. The primary care survey provided data on the prescribed medications of each study participant.

Statistical analysis

All analyses were performed in R 4.2.1 [32]. Data distributions were assessed using histograms. Continuous variables were expressed as mean ± 1 SD or median (interquartile range) as appropriate; categorical variables were expressed as counts and percentages.

Participants were categorised based on self-reported ethnicity as White European, South Asian, and African Caribbean. Individuals of mixed, other, and unknown ethnicity were not included due to small sample sizes. All analyses were conducted within each ethnic group. We used the PGSs as the indipendent variables to test their association with their corresponding cardiometabolic outcomes. For binary outcomes, generalised linear models (glms) with binominal distribution (i.e., logistic regression) were employed. The continuous outcomes were either slightly or heavily skewed. For example, BMI had a skewness greater than 1, while HbA1c and TTG a skewness exceeding 2. As the gamma distribution can flexibly accommodate positively skewed data due to its shape and scale parameters, we used glms with gamma distribution and identity link for our continuous outcomes.

Two regression models were compared. Model 1 was unadjusted to obtain raw estimates. For all outcomes, model 2 was adjusted for age, sex, and SEP in order to obtain more accurate and precise regression estimates. As adjustment for genetic PCs was previously used to control for ancestry during the PGS derivation process, further adjustment for PCs was not pursued. Since this study did not attempt to explore mechanistic pathways downstream of the genotype but upstream of the phenotypes, further models with adjustment for mediators were not pursued. Model assumptions were verified with regression diagnostics and found to be satisfied. Results were then corrected for multiple testing using a false discovery rate of 0.05 [33].

Using a 30% testing dataset, the classification performance (i.e., predicting the binary outcome) of both logistic regression models were evaluated using the receiver operating characteristic (ROC) curve. The area under the curve (AUC) and its associated 95% confidence interval (CI) was derived for each ethnicity for each binary outcome. The ROC AUCs were compared between ethnicities using DeLong’s test. For continuous outcomes, we compared the effect sizes derived from the glms which capture the increase in outcome per unit increase in the PGSs. Since the PGSs underwent a PC-based ancestry centring and had a normal distribution with a similar SD of approximatively 1 (Table 1), the effect sizes was not further standardised.

Table 1 Participant characteristics per ethnic group.

Full size table

Sensitivity analyses

As a sensitivity analysis, model 2 was additionally adjusted for diabetes medications when HbA1c was the outcome and for lipid-lowering drugs when exploring HDL, LDL, total cholesterol and TTG as outcomes. PGSs are upstream of the cardiometabolic outcomes which are upstream of the medications (i.e., a causal chain). In instances where the medications can then affect back the cardiometabolic outcome (e.g., diabetes medications lowering HbA1c), adjusting for them allows the estimation of the direct association between the PGS and the cardiometabolic outcome. This adjustment esentially controls for unmeasured confounders downstream of the medication (e.g., access to healthcare, healthcare seeking behaviour etc.).

In addition, we also calculated the area under the precision-recall curve (PR-AUC) as the ROC AUC can be misleading when the outcomes are rare.

Results

In this study we included 472,036 participants their characteristics and standard PGSs stratified by ethnicity are presented in Table 1, while their enhanced PGSs are presented in Supplementary Table S1. On average, both South Asians (53.4 years) and African Caribbeans (51.9 years) were younger than White Europeans (56.8 years) at the time of outcome assessment. Men comprised 45.5% of White, 54.5% of South Asians and 42.3% of African Caribbeans. There were 45.7% South Asians, 70.5% African Caribbeans, and 23.2% White Europeans in the lowest quartile of the Townsend deprivation index. South Asians had the highest prevalence of T2DM (16.7%), CVD (10.1%), and CAD (7.4%), whilst African Caribbeans had the highest average BMI (29.5) and the highest prevalence of hypertension (72.6%). Despite the PC-based ancestry centring, the PGSs experienced small residual deviations from absolute zero in South Asians and African Caribbeans (Table 1). Model 1 and model 2 results for the standard and enhanced PGSs are presented in Table 2. Results from model 2 for the enhanced PGSs are presented below.

Table 2 Regression results stratified per ethnicity.

Full size table

Type 1 diabetes

The association between the enhanced PGSs and T1DM was strongest for White Europeans (odds ratio [OR] 3.09 95% CI [2.72, 3.40]) followed by South Asians (OR 1.52 95% CI [1.11, 2.07]) and African Caribbeans (OR 1.40 95% CI [0.99, 1.95]) (Fig. 1A). The PGS’ predictive performance was highest in White Europeans (AUC 0.84 95% CI [0.80, 0.89]) followed by South Asians AUC 0.63 95% CI [0.49, 0.77] and African Caribbeans (AUC 0.50 95% CI [0.32,0.68]) (Table 3).

**Fig. 1: Violin plots highlighting the effect sizes per standard deviation increase in the enhanced PGSs for BMI and diabetes-related traits stratified by ethnicity.**

Table 3 Predictive power of PGSs for diabetes-related binary outcomes stratified by ancestry.

Full size table

Type 2 diabetes

According to the OR, the performance was highest in White Europeans (OR 2.48 95% CI [2.39, 2.58]) followed by South Asians (OR 2.05 95% CI [1.91, 2.20]) and African Caribbeans (OR 1.51 95% CI 1.51 [1.30, 1.48]) (Fig. 1B). According to the AUC, the enhanced PGS’ predictive performance was higher in White Europeans (AUC 0.80 95% CI [0.79, 0.82]) compared to South Asians (AUC 0.76 95% CI [0.73, 0.78]) and African Caribbeans (AUC 0.73 95% CI [0.69, 0.76]) (Table 3).

HbA1c

The regression coefficient (β) was higher in White Europeans and South Asians compared to African Caribbeans. One unit (or 1 SD) increase in the enhanced PGS was associated with a 1.69 mmol/mol 95% CI (1.65, 1.73) higher HbA1c in White Europeans, 1.79 95% CI (1.57, 2.00) in South Asians and 1.03 (0.81, 1.26) in African Caribbeans after adjusting for sex, age, SEP (Table 2). The difference between White Europeans and South Asians was not statistically significant (p = 0.370). Results are visually depicted in Fig. 1C.

BMI

A unit increase in the enhanced PGS resulted in a 1.71 kg/m² 95% CI (1.68, 1.74) increase in BMI in White Europeans, 1.31 95% CI (1.22, 1.40) in South Asians and 0.90 95% CI (0.80, 1.00) in African Caribbeans (Table 2 and Fig. 1D).

Hypertension

There was no difference in performance by ethnicity according to the ROC curve analysis (Table 3). The ORs were similar across all ethnicities using both standard (≈1.50) and enhanced PGSs (≈1.70). (Table 2 and Fig. 2C).

**Fig. 2: Violin plots highlighting the effect sizes per standard deviation increase in the enhanced PGSs for vascular traits stratified by ethnicity.**

CVD and CAD

For CVD, the performance of the enhanced PGS was higher in White Europeans (AUC 0.77 95% CI [0.76, 0.78]) and South Asians (AUC 0.74 95% CI [0.70, 0.77]) vs African Caribbeans (AUC 0.66 95% CI [0.61, 0.72]) all p < 0.025 (Table 3). Similarly, the ORs were higher in White Europeans (OR 1.61 95% CI [1.56, 1.66]) and South Asians (OR 1.58 95% CI [1.45, 1.71]) compared to African Caribbeans (OR 1.20 95% CI [1.09, 1.32]) (Fig. 2A).

For CAD, the results were similar to those reported above for CVD, with a higher predictive performance according to the ROC curve analysis (Table 3) and higher ORs (Table 2) in White Europeans and South Asians compared to African Caribbeans.

Stroke

The performance of the enhanced PGS according to the ROC curve analysis (AUC ≈ 0.70) and the ORs (1.20–1.40) were similar across all ethnicities (Fig. 2D, Table 3).

HDL and LDL

One SD increase in the enhanced HDL PGS resulted in a 0.135 mmol/l 95% CI (0.133,0.137) greater HDL in White Europeans, 0.107 95% CI (0.101, 0.113) in South Asians and 0.089 95% CI (0.082, 0.097) in African Caribbeans (Table 2 and Fig. 3A, B) after adjusting for sex, age, SEP. A unit increase in the enhanced LDL PGS was associated with a higher LDL in White Europeans (0.267 mmol/l 95% CI [0.261, 0.272]) followed by African Caribbeans (0.216 95% CI [0.199, 0.233]) and South Asians (0.169 95% CI [0.149, 0.188]).

**Fig. 3: Violin plots highlighting the effect sizes per standard deviation increase in the enhanced PGSs for lipid traits stratified by ethnicity.**

Total cholesterol and triglycerides

A unit (or 1 SD) increase in the enhanced PGS resulted in a greater total cholesterol in White Europeans (0.278 mmol/l 95% CI [0.269,0.286]) compared to African Carribeans (0.202 95% CI [0.179, 0.226]) and South Aasians (0.183 95% CI [0.157, 0.208]) (Fig. 3C). On the other hand, a unit (or 1 SD) increase in the TTG enhanced PGS was associated with a higher TTG in South Asians (0.278 mmol/l 95% CI [0.257, 0.299]) compared to White Europeans (0.228 95% CI [0.222, 0.234]) and African Caribbeans (0.086 95% CI [0.071, 0.101]) (Fig. 3D).

Sensitivity analyses

The regression results stratified per ethnicity for HbA1c, HDL, LDL, total cholesterol, and triglycerides with further adjustments for diabetes medications or lipid-lowering drugs as appropriate are presented in Supplementary Table S2. In general, the findings were replicated, but the effect sizes were slightly smaller for HbA1c and slightly larger for the lipid outcomes.

The PR-AUC results stratified by ethnicity are presented in Supplementary Table S3. The PR-AUC was larger in White Europeans followed by South Asians and African Caribbeans for T1DM, CVD and CAD. However, the estimates were similar for hypertension and stroke. The PR-AUC was greater in South Asians compared to White Europeans for T2DM.

Discussion

In this study we evaluated the performance of standard and enhanced UKB cardiometabolic PGSs derived mostly in White European populations in association with their respective observed phenotype by ethnicity. Whilst the UKB PGSs included some data from multi-ethnic GWAS studies, the performance of both the standard and enhanced PGSs was better in White Europeans compared to South Asians and African Caribbeans for most cardiometabolic outcomes. This can be explained by the predominance of White European GWAS data when deriving the PGSs.

Factors driving poorer PGS performance in ethnic diverse populations

According to the National Human Genome Research Institute and European Bioinformatics Institute GWAS catalogue almost 80% of the GWAS studies were performed in White Europeans which represent roughly 10% of the global population [34]. In contrast, 25% of the global population is of South Asian and 15% of African Caribbean ancestry. Thus, GWAS data is scarce in non-White ancestries. This has multiple downstream implications and might partly explain the worse performance of the PGSs in multi-ethnic populations [35]. Firstly, linkage disequilibrium (LD) varies across ancestries which may drive differences in effect size estimates in GWASs [36]. Secondly, imputation reference panels which are widely used to address bias in GWASs are less efficient in non-White ancestries due to data scarcity. Thirdly, within-ethnicity ancestry subcategories in non-White population are less studied. This is important because within-ethnicity heterogeneity leading to differential predictive power of PGSs in the same ethnicity has been reported [11]. Fourthly, the normal reference ranges for quantitative biomarkers may vary between ethnicities [37]. Without ethnicity specific cut-offs, there is an inherent bias in any GWAS which categorises/binarizes quantitative traits. Lastly, studies may be reporting common benign variants as pathologic in other ethnicities just because they are rare in White Europeans [2]. Thus, large ethnic diverse datasets and improved treatment of LD and variant frequencies are increasingly needed to create equitable PGSs before widespread clinical use [35].

Ethnic inclusivity for equitable implementation of polygenic scores

In CVD research, the vast majority of cohort studies enroled mostly people of White European ancestry. There are only a few studies which include genetic data in ethnic minorities. These either focus on a single ethnic group (e.g., East London Genes & Health [ELGH], China Kadoorie Biobank [CKB], Mexico City Prospective Study [MCPS], New Delhi Birth Cohort Study, OLA etc.) or multiple ethnic groups (e.g., Age, Gene/Environment Susceptibility-Reykjavik Study [AGES-Reykjavik], ARIC, Born in Bradford (BiB), Cardiovascular Health Study [CHS], Dallas Heart Study [DHS], Framingham Heart Study [FHS] OMNI cohorts, JHS, MEC, MESA, Rotterdam Study [RS], Southall and Brent Revisited [SABRE] etc.). Importantly, there is a tendency to aggregate individual cohorts into consortia (e.g., genetic data from AGES, ARIC, CHS, FHS and RS cohorts are available through the Cohorts for Heart and Aging Research in Genomic Epidemiology [CHARGE] consortium). Despite these collections, the percentage of non-White European ancestry participants in GWASs has not increased in recent years [34]. This suggests that the reduced performance of PGSs in ethnic minorities is unlikely to improve in the near future.

In developed nations, the low participation of ethnic minorities in biomedical research is multi-factorial but mainly related to reduced trust given past research misconduct and feelings of racial discrimination [38]. However, movements such as the All of Us Research Program from National Institute of Health are working towards having a culturally aware approach to engage under-represented ethnic minorities in research [39].

Ancestry inclusivity for equitable implementation of polygenic scores

Race and ethnicity are socio-cultural constructs, whilst ancestry refers to the genetic origin of a population. Engaging under-represented ethnic and ancestry minorities in genomics research should be a global research priority. Indeed, there are movements aiming to address these disparities such as the Human Heredity and Health in Africa initiative [40]. However, lack of funding remains the main limitation of such international movements [41].

Polygenic scores and health inequalities during translation to practice

The advent of genetic data in large cohort datasets such as the UK Biobank has led to the discovery of multiple SNPs which are associated with a variety of cardiometabolic diseases using GWASs. Whilst the added value of PGSs on top of already validated clinical tools is yet to be fully elucidated, current studies suggest that PGSs could: (1) increase disease prediction in early life, (2) help guide population-wide screening and preventative targeted interventions (e.g., lipid lowering drugs in those with a high PGS for total cholesterol and LDL), (3) help promote favourable health behaviours in those with an enhanced risk, (4) improve diagnostic accuracy (e.g., differentiating T1DM vs T2DM in overweight antibody-negative young individuals), and (5) predicting response to treatments [27]. Given the worse performance of PGSs in ethnic minorities, they may miss out on benefiting from improved health outcomes. The deployment of PGSs would benefit the population group which is already privileged in terms of health outcomes further deepening existing healthcare inequalities. Thus, large-scale multi-ancestry GWAS data are urgently needed to generate ethnicity stratified PGSs to tackle health inequalities.

Limitations

Limitations of the UK Biobank PGSs have been previously discussed [26]. With regards to PGS evaluation, the main limitation of our study relates to the lack of widely accepted performance metrics [42]. Whilst phenotypic variance explained (R²) and association p-values have been previously proposed [43], we used effect-size metrics for the outcome as these are widely used for established traditional risk factors. However, these do not accurately capture disease prevalence in the general population.

In addition, the use of the self-reported ethnicity is not in line with the latest recommendations on the use and reporting of race and ancestry in genetic research, which instead recommends the incorporation of ancestry informative markers for a more precise characterisation of an individual’s identity [5, 44]. Importantly, whilst there is a good overlap between genetic ancestry and self-reported ethnicity, these are not identical in the UK Biobank [25]. Nonetheless, self-reported ethnicity is also able to capture social constructs (e.g., health beliefs) which could drive the observed differences in PGSs’ performance. In addition, detailed ethnic groupings (defined according to the 16 categories of the 2001 census) had to be collapsed into the 5 high level categories of the census to increase the statistical power of the analyses conducted in multi-ethnic populations.

As underlying differences in allele frequencies and LD likely make a major contribution to the ancestry performance differences in non-ancestry matched PGSs, the scores could be improved through the use of the appropriate ancestral reference LD. However, we did not attempt to improve the PGSs in this study as our aim was solely to evaluate the ones derived by Thompson et al. [26]. While our findings that PGSs perform better in White Europeans may not be unsurprising, there is a need to provide empirical evidence, as without this, the proposed PGSs will be used without consideration of the ethnic background. While Thompson et al. [26] provided the effect sizes for the associations between PGSs with their corresponding outcomes stratified by ethnicity, we were able to build on this work by using more sophisticated statistical approaches (e.g., ROC and precision recall curves) to better evaluate the performance of the PGSs in multi-ethnic populations. Our work is important because it highlights that these PGSs released by UK Biobank need to be used with consideration in multi-ethnic populations and underscores the need for improving them. In addition, we discuss the factors driving poorer PGS performance in ethnically diverse populations, and the need for ethnic inclusivity for the equitable implementation of PGSs to reduce health inequalities as they transition to clinical practice.

Although the ethnic minorities were 3–4 years younger at recruitment, incident cardiometabolic diseases have a younger age of onset (e.g., diabetes in South Asians [45] and hypertension in those of African ancestry [46] occur ≈10-years earlier compared to White Europeans). In addition, SABRE data showed that a lower proportion of UK White individuals were diagnosed with diabetes in the National Health Service compared to South Asians and African Caribbeans when comparing the study blood test results with the healthcare records [47]. As a higher proportion in the ethnic minority groups should have already been diagnosed with the cardiometabolic diseases, it would be expected that PGSs would actually perform better in the ethnic minorities. Thus, another important limitation is that we might actually underestimate the performance difference of the PGSs between White Europeans and the multi-ethnic populations.

Conclusion

In general, UK Biobank standard and enhanced PGSs had markedly better performance in White Europeans compared to South Asians and African Caribbeans when evaluating cardiometabolic phenotypes. More GWAS data in ethnic minorities is required to improve the performance of the PGSs to avoid perpetuating health inequalities especially since cardiometabolic diseases are more prevalent in South Asians and African Caribbeans.

Data availability

The UK Biobank data is available via an application from https://www.ukbiobank.ac.uk/. This work was conducted under application ID 7661.

References

Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12:44.
Article PubMed PubMed Central Google Scholar
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91.
Article CAS PubMed PubMed Central Google Scholar
Clarke SL, Assimes TL, Tcheandjieu C. The propagation of racial disparities in cardiovascular genomics research. Circ Genom Precis Med. 2021;14:e003178.
Article PubMed PubMed Central Google Scholar
Mathur R, Rentsch CT, Venkataraman K, Fatumo S, Jobe M, Angkurawaranon C, et al. How do we collect good-quality data on race and ethnicity and address the trust gap? Lancet. 2022;400:2028–30.
Article PubMed Google Scholar
Mersha TB, Abebe T. Self-reported race/ethnicity in the age of genomic research: its potential impact on understanding health disparities. Hum Genom. 2015;9:1.
Article Google Scholar
Frank R. The molecular reinscription of race: a comment on “genetic bio-ancestry and social construction of racial classification in social surveys in the contemporary United States”. Demography. 2014;51:2333–6.
Article PubMed Google Scholar
Kamiza AB, Toure SM, Vujkovic M, Machipisa T, Soremekun OS, Kintu C, et al. Transferability of genetic risk scores in African populations. Nat Med. 2022;28:1163–6.
Article CAS PubMed PubMed Central Google Scholar
Hodgson S, Huang QQ, Sallah N, Griffiths CJ, Newman WG, Trembath RC, et al. Integrating polygenic risk scores in the prediction of type 2 diabetes risk and subtypes in British Pakistanis and Bangladeshis: a population-based cohort study. PLOS Med. 2022;19:e1003981.
Article CAS PubMed PubMed Central Google Scholar
Dikilitas O, Schaid DJ, Kosel ML, Carroll RJ, Chute CG, Denny JC, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am J Hum Genet. 2020;106:707–16.
Article CAS PubMed PubMed Central Google Scholar
Grinde KE, Qi Q, Thornton TA, Liu S, Shadyab AH, Chan KHK, et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet Epidemiol. 2019;43:50–62.
Article PubMed Google Scholar
Clarke SL, Huang RDL, Hilliard AT, Tcheandjieu C, Lynch J, Damrauer SM, et al. Race and ethnicity stratification for polygenic risk score analyses may mask disparities in hispanics. Circulation. 2022;146:265–7.
Article PubMed PubMed Central Google Scholar
American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes care. 2012;35:S64-S71.
Narayan KMV, Kanaya AM. Why are South Asians prone to type 2 diabetes? A hypothesis based on underexplored pathways. Diabetologia. 2020;63:1103–9.
Article PubMed PubMed Central Google Scholar
Pham TM, Carpenter JR, Morris TP, Sharma M, Petersen I. Ethnic differences in the prevalence of Type 2 diabetes diagnoses in the UK: cross-sectional analysis of the health improvement network primary care database. Clinical Epidemiology. 2019;11:1081–8.
Mathur R, Palla L, Farmer RE, Chaturvedi N, Smeeth L. Ethnic differences in the severity and clinical management of type 2 diabetes at time of diagnosis: a cohort study in the UK Clinical Practice Research Datalink. Diabetes Res Clin Pract. 2020;160:108006.
Article CAS PubMed PubMed Central Google Scholar
Khanolkar AR, Amin R, Taylor-Robinson D, Viner RM, Warner J, Gevers EF, et al. Ethnic differences in early glycemic control in childhood-onset type 1 diabetes. BMJ Open Diabetes Res Care. 2017;5:e000423.
Article PubMed PubMed Central Google Scholar
Wolffenbuttel BHR, Herman WH, Gross JL, Dharmalingam M, Jiang HH, Hardin DS. Ethnic differences in glycemic markers in patients with type 2 diabetes. Diabetes Care. 2013;36:2931–6.
Article CAS PubMed PubMed Central Google Scholar
Roth GAMDMPH, Abajobir AMPH, Abera SFM, Aksut BMD, Alam TMPH, Alam KP, et al. Global, regional, and national burden of cardiovascular diseases for 10 Causes, 1990 to 2015. J Am Coll Cardiol. 2017;70:1–25.
Article PubMed PubMed Central Google Scholar
Bentley AR, Rotimi CN. Interethnic differences in serum lipids and implications for cardiometabolic disease risk in african ancestry populations. Glob Heart. 2017;12:141.
Article PubMed Google Scholar
Bilen O, Kamal A, Virani SS. Lipoprotein abnormalities in South Asians and its association with cardiovascular disease: current state and future directions. World J Cardiol. 2016;8:247–57.
Article PubMed PubMed Central Google Scholar
Modesti PA, Reboldi G, Cappuccio FP, Agyemang C, Remuzzi G, Rapi S, et al. Panethnic differences in blood pressure in Europe: a systematic review and meta-analysis. PLOS ONE. 2016;11:e0147601.
Article PubMed PubMed Central Google Scholar
Sproston K, Mindell J. Health Survey for England 2004: The Health of Minority Ethnic Groups– headline tables. 2006. https://digital.nhs.uk/data-and-information/publications/statistical/health-survey-for-england/health-survey-for-england-2004-health-of-ethnic-minorities-headline-results.
Agyemang C, Bhopal RS. Is the blood pressure of South Asian adults in the UK higher or lower than that in European white adults? A review of cross-sectional data. J Hum Hypertens. 2002;16:739–51.
Article CAS PubMed Google Scholar
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779.
Article PubMed PubMed Central Google Scholar
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
Article CAS PubMed PubMed Central Google Scholar
Thompson DJ, Wells D, Selzam S, Peneva I, Moore R, Sharp K, et al. UK Biobank release and systematic evaluation of optimised polygenic risk scores for 53 diseases and quantitative traits. (2022).
Adeyemo A, Balaconis MK, Darnes DR, Fatumo S, Granados Moreno P, Hodonsky CJ, et al. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat Med. 2021;27:1876–84.
Article Google Scholar
Eastwood SV, Mathur R, Atkinson M, Brophy S, Sudlow C, Flaig R, et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLOS ONE. 2016;11:e0162388.
Article PubMed PubMed Central Google Scholar
Garfield V, Farmaki AE, Eastwood SV, Mathur R, Rentsch CT, Bhaskaran K, et al. HbA1c and brain health across the entire glycaemic spectrum. Diabetes Obes Metab. 2021;23:1140–9.
Article CAS PubMed PubMed Central Google Scholar
Bleich SN, Jarlenski MP, Bell CN, LaVeist TA. Health inequalities: trends, progress, and policy. Annu Rev Public Health. 2012;33:7–40.
Article PubMed PubMed Central Google Scholar
Townsend P, Beattie A, Phillimore P. Health and deprivation : inequality and the north / Peter Townsend, Peter Phillimore and Alastair Beattie. London: Routledge; 1989.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. 2022. https://www.R-project.org/.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc: Ser B. 1995;57:289–300.
Google Scholar
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:D896–901.
Article CAS PubMed Google Scholar
Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun. 2019;10:3328.
Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, et al. Generalization and dilution of association results from European GWAS in populations of Non-European ancestry: the PAGE study. PLoS Biol. 2013;11:e1001661.
Article CAS PubMed PubMed Central Google Scholar
Rappoport N, Paik H, Oskotsky B, Tor R, Ziv E, Zaitlen N, et al. Comparing ethnicity-specific reference intervals for clinical laboratory tests from EHR data. J Appl Lab Med. 2018;3:366–77.
Article PubMed Google Scholar
Suther S, Kiros G-E. Barriers to the use of genetic testing: a study of racial and ethnic disparities. Genet Med. 2009;11:655–62.
Article PubMed Google Scholar
Mapes BM, Foster CS, Kusnoor SV, Epelbaum MI, Auyoung M, Jenkins G, et al. Diversity and inclusion for the all of Us research program: a scoping review. PLOS ONE. 2020;15:e0234962.
Article CAS PubMed PubMed Central Google Scholar
Rotimi C. Enabling the genomic revolution in Africa. Science. 2014;344:1346–8.
Article PubMed Google Scholar
Yeh C, Meng C, Wang S, Driscoll A, Rozi E, Liu P, et al. SustainBench: benchmarks for monitoring the sustainable development goals with machine learning. 2021. https://arxiv.org/abs/2111.04724.
Baker SG. Metrics for Evaluating Polygenic Risk Scores. JNCI Cancer Spectr. 2021;5:pkaa106.
Choi SW, Mak TS-H, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15:2759–72.
Article CAS PubMed PubMed Central Google Scholar
Flanagin A, Frey T, Christiansen SL. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA. 2021;326:621.
Article PubMed Google Scholar
Paul SK, Owusu Adjah ES, Samanta M, Patel K, Bellary S, Hanif W, et al. Comparison of body mass index at diagnosis of diabetes in a multi‐ethnic population: a case‐control study with matched non‐diabetic controls. Diabetes Obes Metab. 2017;19:1014–23.
Article PubMed Google Scholar
Lackland DT. Racial differences in hypertension: implications for high blood pressure management. Am J Med Sci. 2014;348:135–8.
Article PubMed PubMed Central Google Scholar
Tillin T, Forouhi NG, McKeigue PM, Chaturvedi N. Southall and Brent REvisited: cohort profile of SABRE, a UK population-based comparison of cardiovascular disease and diabetes in people of European, Indian Asian and African Caribbean origins. Int J Epidemiol. 2012;41:33–42.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank all the UK Biobank members for their participation and continuous engagement with follow-up and all UK Biobank scientific and data collection teams.

Author contributors

All authors were involved in study design and implementation, data analysis and interpretation, critically reviewing and revising the manuscript. In addition, all authors approved the final version as submitted and agree to be accountable for all aspects of the work.

Funding

NC received support from the UK Medical Research Council, Diabetes UK, Wellcome Trust, British Heart Foundation and National Institute for Health Research University College London Hospitals Biomedical Research Centre. RM is supported by Barts Charity (MGU0504). VG is funded by the Professor David Matthews Non-Clinical Fellowship (ref: SCA/01/NCF/22). VG is also supported by joint Diabetes UK and British Heart Foundation grant (ref: 15/0005250). Role of The Funding Source: None of the funders was involved in the study design, the collection, the analysis, the interpretation of the data, and in the decision to submit the article for publication. For the purpose of open access, the authors have applied a creative commons attribution (CC BY) license to any author accepted manuscript version arising.

Author information

These authors jointly supervised this work: Rohini Mathur, Victoria Garfield.

Authors and Affiliations

Department of Population Science and Experimental Medicine, Institute of Cardiovascular Science, University College London, Gower Street, London, WC1E 6BT, UK
Constantin-Cristian Topriceanu, Nish Chaturvedi & Victoria Garfield
MRC Unit for Lifelong Health and Ageing, University College London, 1-19 Torrington Place, London, WC1E 7HB, UK
Constantin-Cristian Topriceanu, Nish Chaturvedi & Victoria Garfield
Centre for Primary Care, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
Rohini Mathur

Authors

Constantin-Cristian Topriceanu
View author publications
You can also search for this author in PubMed Google Scholar
Nish Chaturvedi
View author publications
You can also search for this author in PubMed Google Scholar
Rohini Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Garfield
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantin-Cristian Topriceanu.

Ethics declarations

Competing interests

The views expressed in this article are those of the authors who declare that they have no conflict of interest except for NC who serves on a Data Safety and Monitoring Board for a clinical trial of a glucose lowering agent, funded by AstraZeneca. RM is part of the Genes & Health programme, which is part-funded (including salary contributions) by a Life Sciences Consortium comprising Astra Zeneca PLC, Bristol-Myers Squibb Company, GlaxoSmithKline Research and Development Limited, Maze Therapeutics Inc, Merck Sharp & Dohme LLC, Novo Nordisk A/S, Pfizer Inc, Takeda Development Centre Americas Inc.

Ethical approval

UK Biobank’s ethical approval (11/NW/0382) was from the Northwest Multi-centre Research Committee (MRCEC) in 2011, which was renewed in 2016 and then in 2021. All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Suplementary Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Topriceanu, CC., Chaturvedi, N., Mathur, R. et al. Validity of European-centric cardiometabolic polygenic scores in multi-ancestry populations. Eur J Hum Genet (2024). https://doi.org/10.1038/s41431-023-01517-3

Download citation

Received: 10 June 2023
Revised: 29 October 2023
Accepted: 28 November 2023
Published: 05 January 2024
DOI: https://doi.org/10.1038/s41431-023-01517-3

Subjects

Abstract

Similar content being viewed by others

Genome-wide association studies

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Genomic data in the All of Us Research Program

Introduction

Methods

Study population

Polygenic scores

Cardiometabolic outcomes

Covariates

Statistical analysis

Sensitivity analyses

Results

Type 1 diabetes

Type 2 diabetes

HbA1c

BMI

Hypertension

CVD and CAD

Stroke

HDL and LDL

Total cholesterol and triglycerides

Sensitivity analyses

Discussion

Factors driving poorer PGS performance in ethnic diverse populations

Ethnic inclusivity for equitable implementation of polygenic scores

Ancestry inclusivity for equitable implementation of polygenic scores

Polygenic scores and health inequalities during translation to practice

Limitations

Conclusion

Data availability

References

Acknowledgements

Author contributors

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Supplementary information

Suplementary Material

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links