The transferability of lipid loci across African, Asian and European cohorts

Kuchenbaecker, Karoline; Telkar, Nikita; Reiker, Theresa; Walters, Robin G.; Lin, Kuang; Eriksson, Anders; Gurdasani, Deepti; Gilly, Arthur; Southam, Lorraine; Tsafantakis, Emmanouil; Karaleftheri, Maria; Seeley, Janet; Kamali, Anatoli; Asiki, Gershim; Millwood, Iona Y.; Holmes, Michael; Du, Huaidong; Guo, Yu; Kumari, Meena; Dedoussis, George; Li, Liming; Chen, Zhengming; Sandhu, Manjinder S.; Zeggini, Eleftheria

doi:10.1038/s41467-019-12026-7

Download PDF

Article
Open access
Published: 24 September 2019

The transferability of lipid loci across African, Asian and European cohorts

Karoline Kuchenbaecker ORCID: orcid.org/0000-0001-9726-603X^1,2,3,
Nikita Telkar⁴,
Theresa Reiker^3,5,6,7,
Robin G. Walters ORCID: orcid.org/0000-0002-9179-0321^8,9,
Kuang Lin⁹,
Anders Eriksson¹⁰,
Deepti Gurdasani ORCID: orcid.org/0000-0001-9996-6929³,
Arthur Gilly^3,11,
Lorraine Southam ORCID: orcid.org/0000-0002-7546-9650^3,11,12,
Emmanouil Tsafantakis¹³,
Maria Karaleftheri¹⁴,
Janet Seeley ORCID: orcid.org/0000-0002-0583-5272^15,16,17,
Anatoli Kamali¹⁷,
Gershim Asiki^17,18,19,
Iona Y. Millwood^8,9,
Michael Holmes^8,9,
Huaidong Du^8,9,
Yu Guo²⁰,
Meena Kumari²¹,
George Dedoussis²²,
Liming Li²³,
Zhengming Chen⁹,
Manjinder S. Sandhu ORCID: orcid.org/0000-0002-2725-142X²⁴,
Eleftheria Zeggini^3,11 &
Understanding Society Scientific Group

Nature Communications volume 10, Article number: 4330 (2019) Cite this article

7024 Accesses
56 Citations
105 Altmetric
Metrics details

Subjects

Abstract

Most genome-wide association studies are based on samples of European descent. We assess whether the genetic determinants of blood lipids, a major cardiovascular risk factor, are shared across populations. Genetic correlations for lipids between European-ancestry and Asian cohorts are not significantly different from 1. A genetic risk score based on LDL-cholesterol-associated loci has consistent effects on serum levels in samples from the UK, Uganda and Greece (r = 0.23–0.28, p < 1.9 × 10⁻¹⁴). Overall, there is evidence of reproducibility for ~75% of the major lipid loci from European discovery studies, except triglyceride loci in the Ugandan samples (10% of loci). Individual transferable loci are identified using trans-ethnic colocalization. Ten of fourteen loci not transferable to the Ugandan population have pleiotropic associations with BMI in Europeans; none of the transferable loci do. The non-transferable loci might affect lipids by modifying food intake in environments rich in certain nutrients, which suggests a potential role for gene-environment interactions.

Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits

Article Open access 11 May 2022

Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies

Article Open access 07 November 2023

Mexican Biobank advances population and medical genomics of diverse ancestries

Article Open access 11 October 2023

Introduction

Genome-wide association studies (GWAS) have been very successful in identifying genetic variants linked to cardiovascular disease (CVD) and to cardiometabolic traits¹. Due to the improving predictive accuracy of these variants, genetic risk prediction could soon be implemented in clinical settings^2,3. However, the majority of samples included in these genome “white” association studies were British or US-Americans with European ancestry^4,5 which does not accurately represent the ethnically and ancestrally diverse populations of these nations. Moreover, three quarters of CVD-associated deaths occur in low- and middle-income countries where incidences are rising⁶. Consequently, it is important to determine whether cardiometabolic loci are transferable to other populations.

Previous research assessed the effects of different allele frequencies and linkage disequilibrium (LD) on genetic associations across ancestry groups⁷. Here we ask the fundamental question whether causal variants for blood lipids, a major cardiovascular risk factor, are shared across populations. Heterogeneity in effects of variants could result from epistasis or gene-environment interactions. However, the causal variants are usually unknown. The differences in LD structure between populations make it difficult to compare the observed associations between ancestry groups because the effect of a variant depends on its correlation with the causal variant(s)⁷. Differences in allele frequency also impact the power to detect associations in other ancestry groups.

We employ several strategies which account for these effects and do not require knowledge of the specific causal variants to quantify the extent to which genetic variants affecting lipid biomarkers are shared between individuals from Europe/North America, Asia, and Africa. We assess the transferability of individual signals and compare association patterns across the genome using data from the African Partnership for Chronic Disease Research – Uganda (APCDR-Uganda, N = 6407)⁸, China Kadoorie Biobank (CKB, N = 21,295)⁹, the Hellenic Isolated Cohorts (HELIC-MANOLIS, N = 1641 and HELIC-Pomak, N = 1945)^10,11, and the UK Household Longitudinal Study (UKHLS, N = 9961)¹². We also use summary statistics from Biobank Japan (BBJ, N = 162,255)¹³ and the Global Lipid Genetics Consortium (European ancestry, GLGC2013 N = 188,577, GLGC2017 N = 237,050)^14,15. We find evidence for extensive sharing of genetic variants that affect levels of HDL- and LDL-cholesterol and triglycerides between individuals with European ancestry and samples from China, Japan and Greek population isolates. We estimate that about three quarters of major lipid loci are reproducible. Using trans-ethnic colocalization, we show that many established loci for triglycerides do not affect levels of this biomarker in Ugandan samples, however. Ten out of fourteen of the lipid loci that were not transferable to the Ugandan samples had pleiotropic associations with BMI in European ancestry samples. None of the transferable loci were linked to BMI. This could point to a role of environmental factors in modifying which genetic variants affect lipid levels.

Results

Reproducibility of established lipid loci

We assessed rates at which established lipid-associated variants were reproducible in other populations. We selected major lipid loci, i.e., those with lipid associations at p < 10⁻¹⁰⁰ based on a score test in the largest European ancestry GWAS. In this context, reproducibility was operationalised as at least one variant from the credible set being associated at p < 10⁻³ based on a score test with serum lipid levels in the target study. We defined the credible set as variants correlated at r² > 0.6 with the lead SNP from the European discovery study. Correlation was estimated from the 1000 Genomes Project samples with European ancestry. As a benchmark, we also assessed replication in a European ancestry study, UKHLS. We found evidence of transferability for 76.5% of major HDL loci in this study (Table 1). For the non-European groups rates ranged from 70.6 to 82.4%. Similar reproducibility rates were observed for LDL loci (61.5–76.9%). For major triglycerides (TG) loci, rates ranged from 78.9 to 94.7%, except in APCDR-Uganda. Only 10.5% of the TG loci showed evidence of reproducibility in that sample. Rates for known loci with p≥10⁻¹⁰⁰ in the discovery set were generally below 10%. However, Biobank Japan, the largest study, exhibited markedly higher reproducibility rates for these loci than the other studies with 24.6–32.7%.

Table 1 Percentage of established lipid-associated loci with evidence of reproducibility in target studies

Full size table

Trans-ethnic genetic correlations

Trans-ethnic genetic correlations were estimated between the three largest studies, China Kadoorie Biobank, Biobank Japan and GLGC2013 (Fig. 1). For GLGC2013 and CKB, correlations were 0.999, 0.778, 0.999 for HDL, LDL, and TG, respectively. For GLGC2013 and BBJ, correlations were 0.999, 0.959, 0.961 for HDL, LDL, and TG, respectively. None of the estimates were significantly different from 1 (Supplementary Table 1). We also compared associations across lipid biomarkers. This consistently showed negative genetic correlations between TG associations and HDL associations, with estimates ranging from r_gen = −0.48 to r_gen = −0.86.

Genetic risk scores

In order to assess patterns of sharing of risk alleles for the smaller studies, we constructed genetic risk scores (GRS) based on the established lipid loci from discovery studies with European-ancestry and assessed the score associations with serum levels of HDL, LDL and TG in HELIC, APCDR-Uganda, CKB and also UKHLS as a benchmark (Fig. 2). All genetic scores were significantly associated with their respective target lipid in the three European samples with largely consistent correlation coefficients and mutually overlapping 95% confidence intervals (CIs) (Table 2). For HDL, LDL and TG, the estimated correlation coefficients ranged from 0.27 to0.28, 0.23 to 0.28 and 0.20 to 0.24, respectively. In APCDR-Uganda, the strongest association was observed for LDL (r = 0.28, SE = 0.01, p = 1.9 × 10⁻¹⁰⁷ based on a mixed model score test). The HDL association was attenuated compared to the European ancestry samples (r = 0.12, SE = 0.01, p = 6.1 × 10−²²). The effect of the TG score was markedly weaker (r = 0.06, SE = 0.01, p = 4.5 × 10⁻⁷). For CKB, the HDL GRS had a correlation of r = 0.18 (SE = 0.02, p = 1.4 × 10⁻²²) and the LDL GRS of r = 0.20 (SE = 0.02, p = 32 × 10⁻²⁶) while the triglyceride GRS showed a stronger attenuation relative to UKHLS with r = 0.14 (SE = 0.02, p = 3.8 × 10⁻¹²). We also assessed associations between a given score and levels of each of the other lipid biomarkers (Supplementary Table 2). In line with the trans-ethnic genetic correlation results, we observed inverse associations between the HDL score and TG levels and vice versa in all studies, except APCDR-Uganda.

Table 2 Associations of genetic risk scores based on established lipid-associated loci and respective serum lipid levels in UKHLS, HELIC-MANOLIS, -Pomak, APCDR-Uganda, and CKB using a linear mixed model analysis

Full size table

Trans-ethnic colocalization

Differences in LD structure, MAF and sample size make it difficult to assess the transferability of individual loci. Therefore, we propose a new strategy to assess evidence for shared causal variants between two populations: trans-ethnic colocalization. For this we re-purposed a method that was originally developed for colocalization of GWAS and eQTL results: Joint Likelihood Mapping (JLIM)¹⁶. In order to assess its performance for GWAS results from samples with different ancestry, we carried out a simulation study. UK Biobank (UKB) was used as a reference with European ancestry and compared to CKB and APCDR-Uganda. In order to derive an upper boundary for the power, we compared UKB to the ancestry-matched UKHLS set. Phenotypes were simulated. Effect size estimates were varied between 0.10 and 0.25 in order to represent a range similar to that observed for major lipid loci¹⁵. In the simulations of distinct causal variants in the non-European and the reference group, the frequencies of false positives were as expected close to 0.05 (Supplementary Table 3, Supplementary Fig. 1). The power to detect shared associations for betas of 0.25 was 73.1% for APCDR-Uganda, 93.1% for CKB and 0.89 for UKHLS (Fig. 3). To investigate whether the lower power for APCDR-Uganda could be due to its smaller sample size, we reran the analyses for CKB using a random subset of samples matching the sample size of APCDR-Uganda. For effect sizes <0.2, the results from this analysis revealed decreased detection power relative to the full CKB set but still consistently higher than APCDR-Uganda. This suggests that the power of this trans-ethnic colocalization method decreases somewhat with greater genetic distance between the populations that are compared.

We applied trans-ethnic colocalization for established lipid loci to each study with UKHLS as the reference. There was evidence for significant (p_jlim < 0.05 based on a permutation test) colocalization with at least one of the target studies for about half of the major lipid loci (Supplementary Table 4). For several of the major TG loci, such as 8q24.13, strong evidence of transferability to the Asian studies was observed whilst there was no evidence of association in APCDR-Uganda. Figure 4 shows the regional association plots of this locus for each data set as an example to demonstrate that differences in LD and frequencies lead to different association patterns. As colocalization can account for such differences, the result from the analysis comparing the European and Asian studies was nevertheless statistically significant (p < 0.001).

We compared major lipid loci that showed evidence of transferability to APCDR-Uganda with those that did not. The proximal genes of transferable loci were enriched for lipid pathways including lipoprotein metabolism, lipid digestion mobilisation and transport, chylomicron-mediated lipid transport and metabolism of lipids and lipoproteins. The proximal genes of the non-transferable loci were enriched for several other pathways in addition to lipid metabolism, including SHP2 signalling, ABV3 integrin pathway, cytokine signalling in immune system, cytokine-cytokine receptor interaction and transmembrane transport of small molecules (Supplementary Figs. 2 and 3). We also assessed the associations of these loci with BMI in samples with European ancestry using publicly available summary statistics from the GIANT consortium¹⁷ (N≥484,680) (Table 3). Ten of the fourteen non-transferable lipid loci had pleiotropic associations with BMI at a Bonferroni-adjusted threshold of p < 0.0024. None of the seven transferable lipid loci were associated with BMI.

Table 3 Association of established lipid-associated loci with body mass index by whether the locus was transferable to APCDR-Uganda. BMI association results are based on N≥484,680 samples from the meta-analysis between GIANT and UK Biobank¹⁷

Full size table

Discussion

Recent efforts to increase global diversity in genetics studies have been vital, enabling this comprehensive cross-population comparison of genetic associations with blood lipids. We provide evidence for extensive sharing of genetic variants that affect levels of HDL- and LDL-cholesterol and triglycerides between individuals with European ancestry and samples from China, Japan and Greek population isolates. We estimated that at least about three quarters of major lipid loci are reproducible. This was highly consistent across all studies except for triglyceride loci in APCDR-Uganda. None of the estimates of trans-ethnic genetic correlations between European, Chinese and Japanese samples were significantly different from 1. All GRS associations in the two Greek isolated populations were highly consistent with those in the UK samples (correlations ranged from 0.27 to 0.28, 0.23 to 0.28, and 0.20 to 0.24, for HDL, LDL and TG, respectively, in these studies). Associations of genetic risk scores for LDL were not attenuated in the Ugandan population compared to the UK samples (r = 0.28, SE = 0.01, p = 1.9 × 10⁻¹⁰⁷ based on a score test).

Previous studies that compared the direction of effect of established loci or assessed associations of genetic risk scores reported differing degrees of consistency^{18,19,20,21,22,23,24,25,26,27,28,29}. However, most of them were conducted in American samples with diverse ancestry, had smaller sample sizes and applied a single-variant look-up or GRS for a limited number of genetic variants. The high degree of consistency for cholesterol biomarkers we observed also contrasts with previously reported trans-ethnic genetic correlations for other traits, such as major depression, rheumatoid arthritis, or type 2 diabetes, which were substantially different from 1^30,31. In a recent application using data from individuals with European and Asian ancestry from the UK and USA, the average genetic correlation across multiple traits was 0.55 (SE = 0.14) for GERA and 0.54 (SE = 0.18) for UK Biobank³².

As a limitation of our study, we did not adjust for use of lipid-lowering medication. This could in principle cause a small downward bias for the genetic effect estimates. However, few of the participants of the Ugandan and Chinese studies used lipid-lowering drugs. So this is unlikely to have an effect on the main conclusions of this work.

Differences in LD structure, MAF and sample size make it difficult to assess the transferability of individual loci. We therefore propose a new approach: trans-ethnic colocalization. Simulations showed consistent control of type I error rates, as well as power greater than 80% to detect shared associations between samples with European and Chinese ancestry for SNP effects greater or equal to 0.15. However, power was decreased for comparisons between samples from APCDR-Uganda and UK Biobank (51.5–73.1%). Hence, for the current implementation non-significant colocalization should not be considered as definitive evidence for the absence of shared causal variants when comparing African and European samples. Future work should address this through better modelling of the LD structure. Moreover, for many of the major lipid loci, more than one independent association signal was identified in discovery GWASs¹⁵. When these are located in close proximity to each other, they can interfere with the trans-ethnic colocalization analysis because JLIM assumes a single causal variant. Therefore, future work should extend this approach to accommodate loci harbouring multiple causal variants.

Using trans-ethnic colocalization, we showed that many established loci for triglycerides did not affect levels of this biomarker in Ugandan samples. This included loci associated at genome-wide significance in all the other studies, such as GCKR at 2p23.3 or LPL at 8p21.3. The genetic risk score for triglycerides had a weak effect on measured levels in APCDR-Uganda. This is unlikely to be an artefact of unreliable measurement: triglyceride levels had a heritable component in this sample (SNP heritability of 0.25, SE = 0.05⁸) and there were genome-wide significant associations. It is also unlikely that this can be explained purely by differences in LD and MAF because they would affect the analyses of the other two lipid biomarkers as well. Instead these discrepancies could be caused by gene-environment interactions. Ten out of fourteen of the lipid loci that were not transferable to the Ugandan samples had pleiotropic associations with BMI in European ancestry samples while none of the transferable loci were linked to BMI. It is possible that the non-transferable variants affect the amount of food intake with downstream consequences for lipid levels. This might require an environment offering diets that are rich in certain nutrients. While the proximal genes for transferable loci were almost exclusively linked to pathways of lipid metabolism, the ones for non-transferable loci were involved in diverse pathways which is in line with this hypothesis. An alternative explanation could be that the non-transferable loci are involved in metabolising nutrients given a particular diet that is not common in Uganda with downstream consequences for weight.

Overall, this could suggest an important role of environmental factors in modifying which genetic variants affect lipid levels. Studying the causes for discordant loci between groups has promise to further elucidate the biological mechanisms of lipid regulation and other complex traits. Applying genetic risk prediction within clinical settings is receiving increasing attention. Our findings demonstrate that the transferability of genetic associations across different ancestry groups and environmental settings should be assessed comprehensively for medically relevant traits. This is important in order to ensure that health benefits of precision medicine are widely shared within and across populations. Ongoing programs in underrepresented countries³³, such as the Human Hereditary and Health in Africa Initiative³⁴, and programs focussing on underrepresented groups, such as PAGE³⁵, All of Us³⁶, or East London Genes and Health³⁷, could provide the basis for this.

Methods

Data resources

We included data from the Global Lipid Genetics Consortium (European ancestry samples only, GLGC), The UK Household Longitudinal Study (UKHLS), two isolated populations from the Greece Hellenic Isolated Cohorts (HELIC), a rural West Ugandan population from the African Partnership for Chronic Disease Research (APCDR-Uganda) study, China Kadoorie Biobank (CBK), and Biobank Japan (BBJ). Raw genotype and phenotype data were available for UKHLS, APCDR-Uganda, CKB, HELIC-MANOLIS, and HELIC-Pomak. All participants provided written informed consent and each study obtained approval from ethical review boards. The APCDR-Uganda study was approved by the Uganda Virus Research Institute, Science and Ethics Committee (Ref. GC/127/10/10/25), the Uganda National Council for Science and Technology (Ref. HS 870), and the U.K. National Research Ethics Service, Research Ethics Committee (Ref. 11/H0305/5). The HELIC study was approved by the Harokopio University Bioethics Committee. The UKHLS study has been approved by the University of Essex Ethics Committee and the nurse data collection by the National Research Ethics Service (10/H0604/2). For CKB, central ethics approvals were obtained from Oxford University, and the China National CDC. In addition, approvals were also obtained from institutional research boards at the local CDCs in the 10 regions. BBJ was approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences, the University of Tokyo. Our analyses were based on summary statistics for BBJ and GLGC. The details of genotyping, QC and imputation for all studies are summarised in Supplementary Table 5. Descriptive information about the sample sets is provided in Supplementary Table 6. Details of the quality control, imputation, genome-wide association analyses and ethical approval have also been previously described for GLGC¹⁴, BBJ¹³, HELIC¹⁰, APCDR-Uganda⁸ and UKHLS¹². Each study confirmed sample ethnicity through PCA which rules out sample overlap between studies.

For CKB, 102,783 participants were genotyped using 2 custom-designed Affymetrix Axiom® arrays including up to 803 K variants, optimised for genome-wide coverage in Chinese populations. Stringent quality control included SNP call rate > 0.98, plate effect P > 10⁻⁶, batch effect P > 10⁻⁶, HWE P > 10⁻⁶ (combined 10df χ² test from 10 regions), biallelic, MAF difference from 1KGP EAS < 0.2, sample call rate > 0.95, heterozygosity < mean + 3 SD, no chrXY aneuploidy, genetically-determined sex concordant with database, resulting in genotypes for 532,415 variants present on both array versions. Imputation into the 1,000 Genomes Phase 3 reference (EAS MAF > 0) using SHAPEIT version 3 and IMPUTE version 4 yielded genotypes for 10,276,633 variants with MAF > 0.005 and info > 0.3.

In CKB, lipid levels were regressed against eight principle components, region, age, age², sex, and − for LDL and TG − fasting time² for the single SNP association analysis. For CKB, PCs were included in both single SNP and PRS association analyses to improve inflation. Recruitment for CKB occurred at 10 different rural and urban locations across China leading to somewhat increased population structure. The resulting inflation estimates lambda after PC adjustment were 1.063, 1.050, and 1.053 for HDL, LDL, and TG, respectively. LDL levels were derived using the Friedewald formula. After rank-based inverse normal transformation, the residuals were used as the outcomes in the genetic association analyses using linear regression. Associations were carried out within a mixed model framework using BOLT-LMM³⁸.

The single SNP association analysis for APCDR-Uganda was carried out within a mixed model framework using GEMMA³⁹. Rank-based inverse normal transformation was applied to the lipid biomarkers after adjusting for age and gender. For Uganda, the inflation estimates lambda were 1.000, 1.004, and 1.005 for HDL, LDL, and TG, respectively.

Established lipid loci

A list of established lipid-associated loci was extracted from the latest Global Lipid Genetics Consortium (GLGC2017) publication¹⁵ reporting 444 independent variants in 250 loci associated at genome-wide significance with HDL, LDL, and triglyceride levels. We excluded three LDL variants where the association was not primarily driven by the samples with European ancestry. We assessed evidence for transferability of the loci, applied trans-ethnic colocalization and used them to construct genetic risk scores.

Reproducibility of established lipid loci

We assessed evidence that these established lipid signals are reproducible in other populations. For loci harbouring multiple signals, we only kept the most strongly associated variant. Out of the 444 loci, this left 170 HDL, 135 LDL and 136 TG variants. We distinguished major loci, i.e. those with p < 10⁻¹⁰⁰ based on a score test in GLGC2017. For each lead SNP we identified all variants in LD (r² > 0.6) based on the European ancestry 1000 Genomes data. We assessed whether the lead or any of the correlated variants, henceforth called credible set, displayed evidence of association in the target study. If this was not the case, we tested whether there was any other variant with evidence of association within a 50 Kb window. We used a p-value threshold of p < 10⁻³ based on a score test. This threshold was derived by computing the minimum p-value in 1000 random windows of 50 Kb for each study. Less than 5% of random windows had a minimum p < 10⁻³ for the non-European ancestry studies. While this p-value threshold might not be appropriate to provide conclusive evidence of reproducibility for individual loci, we used this to test evidence of reproducibility across sets of loci. These analyses excluded the HELIC studies because the smaller sample size makes it difficult to differentiate between lack of power and lack of reproducibility.