Polygenic prediction of lipid traits in sub-Saharan Africans


 Polygenic risk scores (PRS) can enhance risk stratification and are useful for precision medicine interventions. Here we show that African American Genome-wide Association Study (GWAS) derived PRS enhance prediction of lipid traits in Sub-Saharan Africans. Our PRS prediction varied greatly between South African Zulus (LDL-C, R2 = 8.49%) and Ugandans (LDL-C, R2= 0.043%), potentially attributable to environmental factors. Moreover, the PRS shown here had a higher discriminatory ability (AUC = 74.6%) than conventional risk factors (AUC= 67.8%) to identify extreme phenotypes. This work highlights the utility of PRS derived from relevant ethnic groups for identifying high-risk cases missed by conventional clinical factors.


Background
Genome-wide association studies (GWAS) have successfully identi ed and characterised genetic variants associated with lipid traits [1][2][3] . To date, roughly 700 single nucleotide polymorphisms (SNPs) are associated with various lipid traits [3][4][5][6][7][8][9] . These discoveries are now beginning to unravel the biology of dyslipidaemia and aid prediction for precision medicine. Polygenic risk scores (PSR) can be generated to predict the risk of a disease in an independent population 10,11 . However, most lipid trait discoveries have been made in European or Asian ancestries [4][5][6][7][8][9] . PRS derived from European ancestry tend to perform poorly in genetically diverse populations, including Africans 10 , partly due to the unique differences in linkage disequilibrium (LD) patterns, allele frequencies and environmental exposures 12 between populations. Lack of precise PRS in Africans hinders risk strati cation and targeted treatment essential for precision medicine and may exacerbate health disparities.
Recent studies have indicated that using multivariate approaches and multi-ancestry summary statistics enhance PRS performance 13,14 . Moreover, previous studies suggested that using summary statistics from African Americans may improve PRS performance in sub-Saharan Africans 15 . We therefore aimed to determine the optimal approach for lipid traits in sub-Saharan Africans using publicly available GWAS summary statistics.
We computed PRS using PRSice-2 7 . Of the many PRS computed at various P-value thresholds that ranged from 1 to 5E-08, the PRS that explained the highest variance (R 2 ) of the trait was selected as the best performing for the African (univariate and multivariate), European and Multi-ancestry GWAS (Methods). In the South African Zulu target dataset (Table S1), the best performing PRS for low-density lipoprotein cholesterol (LDL-C) was from the multivariate approach derived using high-density lipoprotein cholesterol (HDL-C), LDL-C and triglyceride (TG) African American summary statistics (R 2 = 8.48%, P T <5 x 10 -08 ). This was followed by the African American univariate (R 2 = 8.14%, P T <5 x 10 -08 ) multi-ancestry approach, derived from individuals of African ancestry, then European and Hispanic American ancestry (R 2 = 6.32%, P T <5 x 10 -08 ) and nally individuals of European ancestry (R 2 = 1.61%, P T <5 x 10 -08 , Fig.1A and Table.S1). Moreover, the African American derived PRS (coe cient range 0.100 to 0.286) were better correlated with all serum lipid levels that the European PRS (coe cients ranged from 0.091 to 0.123) in South African Zulu (Fig.S2).
We proceeded to evaluate risk strati cation based on the deciles of the PRS for the lipid traits presented (Methods). Analysis of variance (ANOVA) was used to compare the means of serum lipid levels. In parallel, we observed that individuals in the top 10% of the PRS had higher serum lipid levels than individuals in the bottom 10% of the PRS (Fig. 1). Notably, the multi-ancestry derived PRS was the best performing approach for HDL-C (Fig.1E) and TG (Fig.1G). Individuals at the top 10% of the PRS had a higher mean difference of 0.16 mmol/L and 0.45 mmol/L for HDL-C and TG levels, respectively, than individuals at the bottom 10% PRS. For LDL-C and total cholesterol (TC), the best performing approach was the African American multivariate with the mean difference of 0.70 mmol/L for LDL-C (Fig.1F) and 1.09 mmol/L for TC ( Fig.1H) among those at the top 10% of the PRS deciles.
We then sought to compare polygenic predictions in Uganda (East Africa) and South African Zulus (Southern Africa) using similar discovery data sets. The predictive performances of PRS were low in the Ugandan population (Table.S2) together with its correlations with lipid traits (Fig.S2C). The multivariate approach of the African American GWAS was the best performing PRS in the Uganda dataset (R 2 = 0.113%, P T = 0.0012) ( Table.S2). We proceeded to evaluate the transferability of a PRS derived from a multivariate African America discovery in Uganda to the South African Zulu. Using TC to explore this, we noted that the same African American PRS of 286 SNPs predicted poorly in Uganda (R 2 = 0.048%) but much better in the South African Zulu's (R 2 = 7.061%) (Fig.S3). Environmental factors might be responsible for these differences.
We then assessed the ability of the PRS to identify people with extreme lipid levels, compared to conventional risk factors. We computed residuals of the linear model of total cholesterol adjusted for age and sex in South African Zulu. We then selected individuals at the top 10% of the residual density plot as "cases" and those at the bottom deciles as "controls" (Fig.2A). For example, the average TC level in cases was 6.51 mmol/L compared to 2.76 mmol/L in controls, representing a difference of 3.75 mmol/L. Using logistic models, we evaluated the prediction of the African American PRS derived from Uganda in the South African Zulu's. The area under curves (AUC) were 67.6% for clinical factors including T2D, age, sex and ve PCs and 74.7% for PRS only (Fig.2B). This indicates that the PRS was better at identifying individuals with hypercholesterolemia than the conventional risk factors.
Consistent with previous reports, in this study the PRS derived from individuals of African ancestry performed signi cantly better in sub-Saharan Africans than PRS derived from individuals of European ancestry 10,[16][17][18] . The PRS performance for the African American multivariate approach for LDL-C (R 2 = 8.49%) was much higher than the performances reported by Johnson et al., 2015 ranging from 1.99% to 4.48% in African American, Asian American, Caucasians and Hispanics for LDL-C 18 . This supports that PRS computed using African Ancestry discovery from a multivariate GWAS might lead to better polygenic predictions of lipids in Africa. However, the genetic diversity of African Americans and people residing in Africa is different. Future studies are required to assess performance of PRS from African individuals within Africa.
Another limiting aspect is the poor transferability of PRS within Africa. This might be due to differences environmental exposure between the South African Zulus and Ugandans 19,20 . The poor performance of PRS hinders the implementation of PRS in preventative healthcare. It may lead to inaccurate results when applied to different ethnic groups within sub-Saharan Africa. This further suggests the need for more efforts to optimise polygenic prediction in African individuals.
In conclusion, using PRS derived from the African American multivariate approach improved lipid PRS performance in sub-Saharan Africans, as compared to the other considered methods. This approach should be prioritised in studies evaluating PRS application in sub-Saharan Africans by ensuring an increase in the representation of African ancestry individuals in GWAS. Furthermore, the lipid PRS maybe clinically useful to identify individuals at high risk of dyslipidaemia in individuals of African ancestry that are missed by conventional risk factors.
DDS is a population-based cross-sectional study of individuals aged >18 years residing in the urban black communities in Durban, KwaZulu-Natal, South Africa. DCC is a case-control study of individuals aged >40 years with diabetes recruited from tertiary hospitals in Durban. Data collection was conducted from 2009 to 2013 for the DCC and from 2013 to 2014 for the DDS. The survey questionnaire included socioeconomic factors, health information, lifestyle factors, anthropometric measurements (including height, weight, systolic blood pressure, diastolic blood pressure, and hip and waist circumferences), biomarkers for communicable and non-communicable diseases, and genetic data. Of the 2,804 individuals surveyed, 1,204 were from the DDS and 1,600 were from the DCC; more detailed information on the study design and quality controls has been published previously 1,2 . The DDS was approved by the University of KwaZulu-Natal Biomedical Research Ethics Committee (UKZN BREC) (BF030/12) and the UK National Research Ethics Service (14/WM/); the DCC was approved by UKZN BREC (BF078/08) and the UK National Research Ethics Service (11/H0305/6).
The comparative cohort was taken from the Uganda genome resource (UGR). The UGR is the genomic, phenotypic resource generated from the Uganda General Population Cohort (GPC). The GPC is a population-based cohort study founded in 1989, and it has over 22,000 participants from 25 neighbouring villages in Kyamilibwa in rural Uganda. This open cohort study was established to investigate the trends of HIV infection in Uganda. However, the cohort's focus now is to examine the role of host genetic variants associated with communicable and non-communicable diseases in rural Ugandans. Details on the UGR cohort have been published previously 2 .

Measurement of lipid traits
Non-fasting serum lipid levels were measured using the Cobas Integra 400 Plus Chemistry analyser (Roche Diagnostics), an automated analyser that employs four different technologies: absorption photometry, uorescence polarization immunoassay, immune-turbidimetry, and potentiometry for accurate analysis. HDL-C and LDL-C were measured using the homogeneous enzymatic colorimetric For PRS construction, SNPs from MVP serum lipid summary statistics were clumped based on their linkage disequilibrium. We clumped SNPs at different r 2 thresholds, and a 500kb clumping window with r 2 of 0.5 proved to be the best tting and best performing model for all lipid traits (Table.S3). We also tested the best P-value threshold for selecting which clumped SNPs we would include in the nal PRS for the range of 1 to 5E-08. The P-value threshold, which accounted for the highest variance of the trait R 2, was select as the best PRS for TC (Fig.S4). The PRS was calculated by multiplying the weight of the SNPs with the number of risk alleles (0/1/2) carried by each individual using the algorithm implemented in the PRSice-2 software 7 . The PRS generated was incorporated into the generalised linear regression model (GLM) to explain the serum lipid' performance while adjusting for age, sex, type 2 diabetes, and ve principal components. An incremental R 2 was computed from each model by the PRSice algorithm and plotted against the P-value threshold (P T ). R 2 is the difference between the R 2 of the fully adjusted model and the R 2 of the null model; the best PRS achieved the highest R 2 (Fig 1).
Moreover, the best performing PRS was then categorised into deciles. The bottom decile was used as a reference and compared to other deciles. The difference in mean lipid levels across different PRS categories was tested using ANOVA. The performance of the PRS from each lipid trait was compared among individuals of African ancestry (AFR), European ancestry (EUR), multivariate of African American (MAA) and from the multiethnic ancestry (MEA) population using the ggplot2 R statistical package 8,9 .
The multivariate approach of African Americans for HDL-C was derived from a combination of the summary statistics from HDL-C, LDL-C, TG and TC. For LDL, a combination of the summary statistics was derived from HDL-C, LDL-C and TG. We derived our summary statistics from TG, HDL-C, TC, and LDL-C for the TG multivariate approach. For TC, we combined summary statistics from HDL-C, TG and TC as described in a previous study 10 .