Introduction

Cardiovascular atherosclerotic diseases are the leading cause of death not only in developed countries but also in developing countries. Among the several risk factors for atherosclerotic diseases, dyslipidemia has been regarded as the most crucial one, and genetic factors as well as the environmental factors contribute to the variances of lipid traits.1 For each component of dyslipidemia, monogenic mutations associated with Mendelian traits have been extensively pursued, and many mutations in genes that cause overt dyslipidemia have been found; for example, the deficiency of low-density lipoprotein (LDL) receptor in familial hypercholesterolemia2 and the deficiency of cholesteryl ester transfer protein (CETP) in familial hyperalphalipoproteinemia.3

However, causes of most dyslipidemic cases are polygenic rather than monogenic. Therefore, association studies including genome-wide association studies (GWASs) have been performed to identify many loci associated with lipids traits.4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Subsequent experimental approaches demonstrated the functional roles of some candidate genes, such as GalNac transferase (GALNT2), Sort1,12 and Trib1.16

Although many candidate genes for dyslipidemia have been proposed as a result of the above-mentioned studies, most of the GWASs were performed in western populations, and only a few studies14, 17, 18 have comprehensively examined genetic factors associated with lipid traits in East Asian populations. It is important to perform comprehensive studies to investigate the genetic factors for dyslipidemia in different populations because genetic backgrounds, lifestyle and lipid profiles are likely to be different between Western and East Asian populations. Actually, Kim et al.14 identified novel loci that had not been found in GWASs performed in western populations. Moreover, the Japanese population has been genetically isolated and the lipid traits of Japanese are special in that they are characterized by high concentrations of high-density lipoprotein cholesterol (HDL-C). Therefore, in order to enrich our knowledge about lipid-associated genes and to identify new drug targets to treat dyslipidemia, it is important to replicate the previously reported genetic associations and to identify novel genetic factors utilizing GWAS in the Japanese population.

In the present study, we analyzed genome-wide data obtained from 3041 healthy Japanese volunteers in the Japan Pharmacogenomics Data Science Consortium (JPDSC) project19 to examine whether we could replicate previously described genetic associations with the four lipid traits, that is, LDL cholesterol (LDL-C), HDL-C, triglyceride and phospholipid. Afterwards, any associated single-nucleotide polymorphisms (SNPs) in previously unrecognized loci were also tested for associations in a Korean population.

Materials and methods

Database

All the data used in the present study were obtained from the JPDSC data set composed of 3041 healthy Japanese subjects. Detailed data about the data set are described elsewhere.19 The same database has been used for pharmacogenomic studies,20, 21, 22 genomic study for human leukocyte antigen23 and a study for genetic factors affecting electrocardiographic parameters.24

The database contains the genotypes of 2.5 million SNPs from 2994 Japanese healthy volunteers. DNA and clinical information were obtained in two phases. The first set of samples was collected from 2000 to 2003 and the second set of samples was collected from 10 geographic regions in Japan. All the samples were collected from the Japanese population as validated by principal component analysis. Written informed consent was obtained from all the subjects, and this study was approved by the ethical committee of JPDSC.

Genome-wide association study

The final number of subjects included in the GWAS was 2994. Genomic DNAs were genotyped on an Illumina Human Omni 2.5.8 BeadChip (San Diego, CA, USA), and SNPs used in the GWAS were selected according to the following criteria: SNP call rates (95%), the Hardy–Weinberg equilibrium test (P0.01) and minor allele frequency (1%). Total number of the SNPs on the autosomes after quality control was 1 294 241. Lipid measures were adjusted for independent variables within each phase by the linear regression model. The clinical test data were log-transformed and standardized. The standardized residuals in each phase were combined and used as quantitative phenotypes for the association analysis, performed with PLINK (version 1. 1. 3).25 The significance of the association between a trait and genotype was evaluated using the Wald test. Genome-wide significance was inferred at P<5 × 10−8.

Replications of previously reported SNPs

Among the SNPs reported to be associated with LDL-C, HDL-C, triglyceride and phospholipid, the associations with P-values <5 × 10−8 in previous reports were examined. As GWAS platforms were different between ours and those used in previous reports, some SNPs previously reported to be associated with lipid traits were not present in our data. In those cases, we searched for proxy SNPs with high r2 values (not lower than 0.8) with the reported SNPs. When we attempted to replicate n different associations for a trait, the P-value threshold of 0.05/n was used according to the Bonferroni correction. The direction of each association was carefully examined to see whether it was the same in both the previous report and the present study. When a proxy SNP rather than the same SNP was used in the replication, it was carefully examined to see whether the directions of the effects are concordant between the previous study and our study.

Replications of novel SNPs

SNPs with P-values under the prespecified threshold of 1 × 10−5 were tested for replication using data sets obtained from the Korea Association Resource (n=6805) data set that was a part of the Korean Genome and Epidemiology Study cohort.14 Genomic DNAs were genotyped on an Affymetrix 5.0 SNP array (Santa Clara, CA, USA). The covariates used for the correction in the replication study were the same between our data and the Korean data.

As we attempted to replicate 15 different SNPs except for those within PCSK7, the P-value threshold of 0.05/15=0.0033 was used according to the Bonferroni correction. For 6 SNPs within PCSK7, 0.05/6=0.0083 was used. The direction of each association was carefully examined to see whether the direction of the association in the previous report was the same as that in the present study. Meta-analysis was performed using the inverse variance method.

Statistical analysis

Association test and quality control were performed with PLINK (version 1. 1. 3), and all the other statistical analyses were performed using R environment (version 2.15.0; Vienna, Austria).

Results

Lipid phenotypes in JPDSC data

Metabolic trait data and other data were obtained from 3041 healthy subjects.19 The data were obtained in two different phases. Phase 1 data were obtained from 1032 subjects and phase 2 data were obtained from 2009 subjects. Supplementary Tables 1 and 2 show the summarized data separately for males and females. As shown in these tables, LDL-C and HDL-C data were obtained only from phase 2 (1001 males and 1006 females), whereas the other data including total cholesterol, triglyceride and phospholipid levels were obtained from both phase 1 and phase 2. Examination of the patterns in histograms for age, body mass index (BMI), LDL-C, HDL-C, total cholesterol, triglyceride, phospholipid, systolic blood pressure and diastolic blood pressure indicated that log-transformed rather than raw values generally fit closer to normal distributions (Supplementary Figures 1A–K). In addition, two-dimensional plots for each of the traits against BMI indicated that many of the pairs of variables are significantly correlated with each other, and log-transformed rather than raw values generally showed linear relationships (Supplementary Figures 2A–H and Supplementary Table 3). We analyzed extensively the relationships between two different variables and noticed that, in general, each pair of variables correlated with each other and log-transformed rather than raw values showed linear relationships (Supplementary Figures 3A–D indicate representative results). Therefore, log-transformed values were used for the analyses in most of the traits in this study.

Characterization of the phenotype data

We extensively analyzed the data to determine appropriate conditions for the association study. Multivariate regression analysis using age and BMI as independent variables and one of the clinical laboratory data, or systolic or diastolic blood pressure as a dependent variable, showed that BMI had positive effects on LDL-C, total cholesterol, triglyceride, phospholipid, urate, HbA1c, systolic pressure and diastolic pressure in both males and females (Supplementary Table 4). Note that all the variables and values were log-transformed before use. For HDL-C, BMI had significant negative effects in both males and females, whereas it had a positive effect on urate only in males (Supplementary Table 4). Age had positive effects on all the traits described in Supplementary Table 4 except for HDL-C and urate in males, and phospholipid in females.

Replication of previous associations

Clinical test data were log-transformed and standardized as described in the Materials and Methods. The covariates used for the adjustment included alcohol ingestion, smoking, log-transformed BMI, gender, log-transformed albumin, log-transformed HbA1c, log-transformed alanine transaminase, log-transformed aspartate transaminase, log-transformed γ-glutamyl transpeptidase, log-transformed creatinine and eigenvalues of the top three components from the principle component analysis using Eigenstrat.26 Using the results of the analysis, we examined whether previously described SNPs associated with LDL-C, HDL-C, triglyceride or phospholipid are also associated in individuals of Japanese ancestry (Table 1).

Table 1 Replication analyses of previously reported genes associated with lipid traits in GWAS catalog utilizing JPDSC data

Low-density lipoprotein cholesterol

Manhattan plots and QQ plots for LDL-C are shown in Supplementary Figures 4A and B. We attempted to examine 95 SNPs reported to be associated with LDL-C selected from the GWAS catalog. When the SNPs reported in the previous studies were included in our platform, we used those SNPs; however, SNPs that were in tight linkage disequilibrium with the reported SNPs were used otherwise as proxy SNPs. Among the 95 SNPs examined, 57 SNPs in our platform were found to be either the same SNPs or the proxy SNPs. Applying the Bonferroni correction method for multiple comparisons, the corrected threshold of 0.05/57=8.7 × 10−4 was applied. Among the 57 SNPs, 14 showed nominal P-values under 0.05 (Table 1) that were mapped to 6 regions (CELSR2-PSRC1-SORT1, APOB, HMGCR, TIMD4-HAVCR1, HP-HPR-DHX38 and APOE-APOC1-APOC2). In all the 14 SNPs, the directions of the associations were concordant between the previous reports and the present report. Of the 14 SNPs, 12 SNPs were significant even after applying Bonferroni correction (Table 1). Although previously reported to be associated with LDL-C located in the 3′ direction of PCSK9,27 rs72669744 was not significantly associated with LDL-C in the present study, and rs17111474 located in the 5′ direction of the same gene showed a rather low P-value (4.24 × 10−4).

High-density lipoprotein cholesterol

Manhattan plots and QQ plots for HDL-C are shown in Supplementary Figures 4C and D. We attempted to examine 119 SNPs selected from the GWAS catalog reported to be associated with HDL-C. Among the 119 SNPs examined, 71 SNPs in our platform were found to be either the same SNPs or the proxy SNPs. Applying the Bonferroni correction method for multiple comparisons, the corrected threshold of 0.05/71=7.0 × 10−4 was applied. Among the 71 SNPs, 24 showed nominal P-values under 0.05 (Table 1) that were mapped to 13 regions (MACF1-PABPC4, ZNF648, GALNT2, LPL, ABCA1, FADS1-FADS2-FADS3, MMAB-MVK, SCARB1, LIPC, CETP, CT-CHOLF-PRMT8, LCAT and HNF4A). In 23 SNPs out of 24, the directions of the associations were concordant between the previous reports and the present report. Of the 23 SNPs, 8 SNPs were significant even after applying the Bonferroni correction (Table 1).

Triglyceride

Manhattan plots and QQ plots for triglyceride are shown in Supplementary Figures 5A and B. We attempted to examine 99 SNPs from the GWAS catalog reported to be associated with triglyceride. Among the 99 SNPs examined, 60 SNPs in our platform were found to be the same SNPs or the proxy SNPs. Applying the Bonferroni correction method for multiple comparisons, the corrected threshold of 0.05/60=8.3 × 10−4 was applied. Among the 60 SNPs, 19 showed nominal P-values under 0.05 (Table 1) that were mapped to 8 regions (ANGPTL3, GCKR, RPL26P19-HAVCR1, TBL2-MLXIPL, LPL, XKR6-AMAC1L2, TRIB1 and APOA5-APOA4-APOC3-APOA1). In 16 SNPs out of 19, the directions of the associations were concordant between the previous reports and the present report. Of the 16 SNPs, 12 SNPs were significant even after applying the Bonferroni correction (Table 1).

Phospholipid

Manhattan plots and QQ plots for phospholipid are shown in Supplementary Figures 5C and D. We attempted to examine 63 SNPs from the GWAS catalog reported to be associated with phospholipid. Among the 63 SNPs examined, 36 SNPs in our platform were found to be either the same SNPs or the proxy SNPs. Applying the Bonferroni correction method for multiple comparisons, the corrected threshold of 0.05/36=1.38 × 10−3 was applied. Among the 36 SNPs, 5 showed nominal P-values under 0.05 (Table 1) that were mapped to the same 11q12.2 region (MYRF-TIME258-FEN1-FADS1). In all the five SNPs, the directions of the associations were concordant between the previous reports and the present report. Of the five SNPs, none remained significant after applying the Bonferroni correction (Table 1).

Possible novel loci associated with the lipid traits

Using a prespecified P-value threshold of 1 × 10−5, we searched for novel genetic loci harboring significant SNPs associated with LDL-C, HDL-C and triglyceride using Korean data. Except for the association between PCSK7 locus and triglyceride, we selected each one SNP for a locus; that is, four SNPs for LDL-C, seven SNPs for HDL-C and four SNPs for triglyceride. For the replication of the associations between PCSK7 and triglyceride, six SNPs were submitted as this gene was of interest. Although a rare variant within the PCSK7 gene (rs142953140) has been reported to be associated with HDL-C, LDL-C and triglyceride by the data from exome sequencing,28 the frequency of that variation was only 0.2%, and no common SNP has been reported to be associated with SNPs at this locus. All the SNPs except for PCSK7 failed to be replicated (Supplementary Table 5). However, the associations between PCSK7 and triglyceride were successfully replicated by the Korean data (Table 2); three out of the six SNPs showed significant P-values (<0.0083) against the threshold considering the multiple comparisons as described in the Materials and Methods. Furthermore, in two of the SNPs (rs508487 and rs236911), the P-values of the meta-analysis were lower than the threshold of 5 × 10−8 used as the criteria for genome-wide significance (Table 2).

Table 2 Associations of SNPs within PCSK7 gene in Japanese and Korean samples

Discussion

A genome-wide association approach is an established methodology that enables the identification of genetic loci associated with complex traits and diseases unconstrained by prior knowledge. In the fields of lipidology, series of elegant studies utilizing GWAS have successfully identified many lipid-associated genetic variants,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 although these genetic variants account for approximately only one-third of the genetic variance of lipid traits.29 Furthermore, most of the previous GWASs were performed in western populations, and it is important to conduct GWAS in non-western populations to examine whether the identified associations are shared by different ethnic groups and to find novel loci associated with the lipid traits.

In the present study, we examined the associations of 2.5 million SNPs with serum lipid concentrations using the data from 2994 Japanese healthy volunteers obtained from the JPDSC database. We first examined whether the associations of SNPs with LDL-C, HDL-C, triglyceride and phospholipid observed among individuals of European, African and Asian ancestries are replicated in our data (Table 1). The results have shown that some of the associations were replicated in our data. Furthermore, for all the replicated associations that were significant even after Bonferroni correction, the directions were concordant between GWAS catalog and our database. These results indicate that the associations in GWAS catalog are reliable, and that each of these SNPs that showed concordance after Bonferroni correction possesses universal significance on the lipids. Many of such genes including APOB, CETP, and LPL have attracted the attention of lipid researchers.

Among possible novel genetic loci associated with LDL-C, HDL-C or triglyceride, association of the SNPs within PCSK7 and triglyceride were replicated in the Korean data. The present study provided the first evidence that common SNPs within PCSK7 are associated with a lipid phenotype. In a previous study in which exome sequencing rather than SNP arrays was used for association studies, a rare allele within PCSK7 with a minor allele frequency of 0.2% had a large effect on serum HDL-C.28 Although several genes including APOA1/C3/A4/A5 that are known to be involved in the metabolism of HDL-C and triglyceride are located near the PCSK7 gene, rs142953140 in PCSK7 was shown to be associated with HDL-C by conditional analysis adjusting for six known variants in the APOA1/C3/A4/A5 region associated with the lipid.28 In our data, rs522645 remained to be associated with triglyceride (P=0.0438) even after adjustment for rs1942478 (within ZPR1), rs651821 (within APOA5) and rs734104 (within APOC3) that are located near PCSK7 and showed low P-values in our data.

PCSK7 (PC7) is a member of the subtilisin-like proprotein convertase family that processes multiple protein precursors. PCSK9, a member of the same family, is known to cause autosomal-dominant hypercholesterolemia30, 31 and hypocholesterolemia.32 Although PCSK7 is expressed in the colon and lymphoid-associated tissues, it is also expressed to some degree in the liver and the intestine, organs that are largely involved in lipid metabolism.33 Scamuffa et al.34 reported that PCSK7 knockout mice had no apparent abnormal phenotype, whereas Wetsel et al.35 reported that they had impaired cognitive performance. Adeno-associated virus-mediated gene expression of PCSK7 in mice liver was reported to have affected plasma lipid levels.28 In spite of these elegant studies, the lipid metabolism in mice is quite different from that in humans. For example, unlike humans, mice have APOBEC that edits apoB100 mRNA and makes apoB48 mRNA in both liver and intestine. They also lack CETP that greatly affects HDL-C levels. Further basic and experimental approaches are necessary to evaluate the importance of PCSK7 for the modulation of lipid metabolism in humans. In humans, PCSK7 has been reported to be associated with soluble transferrin receptor level.36 They analyzed whether soluble transferrin receptor level was associated with lipid parameters but found no associations. The mechanism of the association between PCSK7 and triglyceride in humans remains to be clarified.

In summary, utilizing the JPDSC data set, we replicated the previously reported associations of 14 SNPs in 5 regions for LDL-C, 23 SNPs in 12 regions for HDL-C, 16 SNPs in 6 regions for triglyceride and 5 SNPs in 1 region for phospholipid, and identified 16 possible novel loci associated with the lipid traits. We further replicated the association between PCSK7 and triglyceride with the Korean data. PCSK7 may be associated with lipids, especially triglyceride, and may serve as a candidate for a new drug target to treat dyslipidemia.