Introduction

Dyslipidemia, involving abnormal levels of serum triglyceride (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C), is a well-known risk factor for cardiovascular disease (CVDs) [1]. Genetic factors contribute a considerable part (54–66%) to lipid metabolism [2]. A number of loci have been mapped by genome-wide association studies (GWAS). Although vastly expanded sample sizes have been used in combined analyses in European studies, only a limited proportion of heritability has been explained by these loci [3, 4]. Further explorations are needed, especially in East Asians. Similar to other GWAS, only a few of the reported lipid associated tag single nucleotide polymorphisms (SNPs) have been fine-mapped for causal SNP localization [5, 6]. Meanwhile, a limited number of genes mediating the associations between the tagSNPs and lipid phenotypes have been uncovered. The translation and application of the reported variants discovered by classical GWAS strategy are the major challenges currently [7].

According to the results of genotype-phenotype association studies for complex diseases, ~80–90% of these GWAS tag SNPs lie outside of the protein-coding regions, which means that these loci likely manifest their effects via regulating the gene expression or splicing [6, 8,9,10]. A study performed by Nicolae and his colleagues found that expression quantitative trait loci (eQTLs) are significantly enriched in trait-associated SNPs reported by GWAS for complex traits [11]. Annotation via eQTL can improve the ability to discover true associations and identify the major steps of the regulatory cascade behind the statistical genotype-phenotype associations. A number of projects and databases focused on genotype-expression associations (e.g., MuTHER, SCAN, etc.) launched the eQTL annotation and SNP function prediction after GWAS [12,13,14,15,16,17]. Recently, the Genotype-Tissue Expression (GTEx) project showed a landscape of eQTL annotation in multiple tissues and accelerated the driver gene prediction in the post-GWAS era [18, 19].

Integrated analyses for GWAS and eQTL in specific tissues have been promising for type 2 diabetes, bipolar disorder, immune-related diseases, etc. [20,21,22,23,24]. We wonder whether such an annotation (eQTL signal in relevant tissues) could be considered as a prior for screening lipid-associated loci in Han Chinese with limited sample size in the discovery stage.

Besides eQTL annotation, chromatin activity regions can be annotated through chromatin immunoprecipitation sequencing (ChIP-seq) or DNase-seq for transcription factors, DNase sensitivity, and histone modification markers such as histone H3 lysine 4 trimethylation (H3K4me3), histone H3 lysine 9 acetylation (H3K9ac), and histone H3 lysine 27 acetylation (H3K27ac) [6, 25,26,27,28]. The NIH Roadmap Epigenomics Consortium has published the largest collection of human epigenomes which facilitated the identification of gene regulation in multiple tissues [29].

In the current study, we performed a multistage GWAS for serum lipid levels. First, SNPs associated with gene expression levels in liver and adipose tissue were annotated as candidates for replications. Second, epigenetic modification signals were used for annotating putative regulatory regions. Then, luciferase reporter assay was conducted to verify the segment involving the causal variants for the regulation of the transcription activity. Finally, gene-based expression-trait association analyses were conducted.

Methods

Study subjects

In the discovery stage, 998 participants with metabolic syndrome (according to the criteria of the Metabolic Syndrome Study Cooperation Group of the Chinese Diabetes Society) and 996 healthy controls were recruited from a community-based survey in Hangzhou, Zhejiang Province, China, as described in our previous study [30]. For the replication stage I, six independent community-based cohorts with a total of 3027 subjects were recruited from east China (Dongyang, Liuheng, Zhujiajian, Tongxiang, and Hangzhou) and Northeast China (Shenyang). For replication stage II, 2130 subjects from Shanghai were included [31] (S Table 1). All participants were Han Chinese. Individuals were excluded if they had cancer or serious chronic liver, lung, or kidney disorders. The study protocol was approved by the Research Ethics Committee of the School of Medicine, Zhejiang University. All participants gave informed consent.

Genotyping, imputation, and quality controls

Genomic DNA was isolated from whole blood using a TACO automatic nucleic acid extraction apparatus. The Illumina Omni-Express 760k chip was used for the genome-wide assay of samples in the discovery stage. SNPs with call rates ≥95% and Hardy–Weinberg equilibrium P-values ≥1.0E-6 were included in the discovery stage. Samples were excluded from analyses if they (1) had genotyping call rates ≤95%, (2) were population outliers identified by running EIGENSOFT [32], or (3) had probable relatives (PI_hat>0.25). Then we imputed ungenotyped SNPs via IMPUTE2 [33]. In the replication stage I, genotyping was done using SNPscanTM (Genesky Biotechnologies Inc., Shanghai, China). In the replication stage II, genotyping data were extracted from our previous study using the Affymetrix Genome-Wide Human SNP Array 6.0, the quality control filtering of the genotyping data can be found elsewhere [31]. Genotyping call-rate control (≥95%) and Hardy–Weinberg equilibrium control (P > 1.0E-3) were implemented in the replication stages. After the quality control, 1742, 3027, and 2130 subjects were included in analyses in the discovery, the replication stage I and II, respectively.

Online data acquisition

Tissue-specific eQTL signals were acquired from the Genotype-Tissue Expression (GTEx) project [18], the Multiple Tissue Human Expression Resource (MuTHER) project, GSE26106 (206 and 60 subjects recruited by the University of Chicago and University of Washington) [14], GSE9588 (266 subjects recruited by Merck) [14], and a published meta-analysis in whole-blood samples from 5311 subjects [34]. Subcutaneous adipose-tissue-specific (GTEx and MuTHER) and liver-tissue-specific (GSE26106 and GSE9588) eQTL signals were used for candidate SNP screening. eQTL signals in whole blood, skin tissue, and transformed fibroblasts were used for additional annotations.

Summarized genome-wide lipid-associated signals were downloaded from the Global Lipids Genetics Consortium (GLGC) [3]. For combined analyses, 94,595 European ancestry individuals from 23 studies genotyped with SNP arrays were included.

A cluster of chromatin status was annotated using the data from ENCODE project via the UCSC genome browser. The H3K27ac modification signals were combined from seven cell lines. The DNase I hypersensitivity loci were integrated from 125 cell types. A cluster of 161 transcription factor ChIP-seq data were annotated. Another set of histone modification marks including H3K4me3, H3K9ac, and H3K27ac were used for mapping active regions via ChIP-seq. The ChIP-seq data in primary mononuclear cells (E062) and adipose nuclei (E063) and was acquired from the Roadmap project [29].

Luciferase reporter assay

To evaluate the transcriptional activity of the variant with a potential function, a segment covering the peaks of the transcription activity signals and the variants was amplified by polymerase chain reaction (PCR) from a heterozygote of rs1880118 genomic DNA sample. The recovered fragments were cloned into pGL-3 basic vectors.

Then transient transfections were conducted in the HEK 293T cell line. The cells (1.0E6) were plated into six-well plates and transfected using Lipofectamine 2000. After 48 h of transfection, the fluorescence intensities were evaluated following the protocol of the kit (Promega Dual-Luciferase Reporter Assay System). The pRL-TK vector encoding for the Renilla luciferase was transcribed, which was used as internal controls to normalize firefly luciferase expression. Transfections were done in triplicate and replicated in independent experiments.

Strategies and statistical analyses

First, as shown in Supplementary Fig. 1, linear regressions were performed for the associations between lipid components (including TG, TC, HDL-C, and LDL-C) via PLINK [35]. Second, SNPs associated with at least two lipid components in the discovery stage (P < 0.05 for the association tests) were annotated using eQTL signals either in subcutaneous adipose tissue or in liver tissue. Identical subcutaneous adipose-tissue-specific eQTL signals (cis SNP-gene pairs) from the GTEx (V4) and MuTHER datasets were identified. The liver-tissue-specific SNP-gene pairs (P < 0.05) were combined from three studies. Third, conditional analyses were performed with any two candidate SNPs within ~1 Mb for linkage disequilibrium (LD) pruning. We kept only one of the statistically significant SNPs if the other SNP had a P-value > 0.05 in conditional analyses. Fourth, trait-associated loci with eQTL annotations were considered for replications. Replications I and II with seven independent Han Chinese cohorts were performed. Finally, we combined the effects of replicable SNPs in Han Chinese samples and 94595 individuals of European ancestry reported by GLGC.

For comparison of the transcriptional activities of the vectors, P-value was determined by t-test for independent samples.

Gene-based analyses

After replication for the trait-associated tagSNP, the mediator gene through which these tagSNPs exert their effects on the traits were estimated using gene-based analyses including “TWAS” (transcriptome-wide association study) [36], “SMR”(summary data–based Mendelian randomization) [37], and “Sherlock” [38].

The basic idea of “TWAS” and “SMR” is based on Mendelian Randomization. SNPs are used as instrumental variables for the test between the gene expression and the trait. The method “TWAS” builds a regression model with a cluster of gene expression of associated SNPs through a sample set with genomic and transcriptomic data as a reference. Then the expression levels of the gene were predicted using this model in a much larger cohort with genotype and phenotype data. The expression-trait association was evaluated using the imputed expression data. Similarly, the “SMR” integrates summary-level data from GWAS with data from eQTL studies to identify genes whose expression levels are associated with a complex trait.

The method “Sherlock” identifies potential gene-disease association by matching the genetic signature (GWAS and eQTLs signals) using a Bayesian statistical framework. If a gene’s genetic signature, provided by eQTL data, matches the gene’s signature of GWAS, it suggested that the expression level of the gene was associated with the phenotype of GWAS.

The data used for these analyses were described in Supplementary Methods.

Results

eQTL-based genome wide screening for lipid-associated loci

In the discovery stage, SNPs associated with ≥2 lipid components (including TG, TC, HDL-C, and LDL-C) were annotated with eQTL signals either in subcutaneous adipose tissue or in liver tissue (S Fig. 1). After LD pruning, fourteen SNPs with independent effects were selected for replications (Table 1, S Table 2). In these loci, a reported locus rs1077834 (at LIPC) was removed from the candidate list. Finally, nine SNPs were successfully genotyped in the replication stage I. After the replication stage I, a significant association was found between rs1880118 and HDL-C levels (P = 1.6E-3). The minor allele of rs1880118 (C allele) was associated with increased levels of HDL-C (beta = 0.032, se = 0.010). The effect was consistent with that at the discovery stage (effect allele = C, beta = 0.066, se = 0.019, P = 4.0E-4). Consistent result was also observed in the replication stage II (effect allele = C, beta = 0.038, se = 0.012, P = 1.6E-3). In addition, we checked the effect of rs1880118 in 94,595 individuals of European ancestry and found a significant association between rs1880118 and HDL-C (effect allele = C, beta = 0.022, se = 0.006, P = 1.6E-4). Combining the results of these association studies, the C allele of rs1880118 was associated with increased levels of HDL-C (Han Chinese combined: beta = 0.039, se = 0.007, P = 4.4E-8; all ancestries combined: beta = 0.029, se = 0.005, P = 1.4E-10). The associations between rs1880118 and TC were also shown in Table 2 and S Fig. 2.

Table 1 Serum lipid-associated loci with eQTL annotations
Table 2 Associations between genotypes of rs1880118 and levels of serum HDL-C and TC in multistage and combined analyses

As Fig. 1a shows, the LD block of rs1880118 contains two genes, RAC1 and DAGLB. A variant rs702485, which is located at 3′UTR of DAGLB, has been reported to be associated with serum HDL-C levels in population of European ancestry, so we evaluated the LD between the reported locus and the novel locus of rs1880118. We found that the r2 of LD was only 0.03 in Han Chinese. The conditional analysis was performed by adjusting for rs702485, an independent effect of rs1880118 was observed (S Fig. 3).

Fig. 1
figure 1

Regional plot of the replicated SNP and annotations including histone modification, transcription factor binding site, and DNase signals in this region. a Regional plot of the tagSNP rs1880118 plotted using Locuszoom. The tagSNP and another three SNPs located in a potential regulatory region are labeled. b Annotations including DNase and transcription factor binding site ChIP-seq (combined with multiple cell lines) from ENCODE project. c Histone modification signals using primary mononuclear cells from peripheral blood (E062, Roadmap project). d Histone modification signals using adipose nuclei (E063, Roadmap project)

Annotations for the replicated tagSNP rs1880118

The minor allele of rs1880118 (C allele) was associated with increased mRNA expression levels of DAGLB in subcutaneous adipose tissue. Consistent associations were found in eQTL analyses using data from GTEx (V4, P = 2.2E-12), GTEx (V6, P = 5.9E-42), and MuTHER project (P = 1.0E-4). The associations between rs1880118 and the expression level of DAGLB were also observed in whole blood samples from multiple eQTL data sources (S Table 3). The 47.7% and 36.4% of the DAGLB expression variance could be explained by the variation of rs1880118 in subcutaneous adipose tissue and whole blood samples according to GTEx V6 data (Fig. 2). In addition, the C allele of rs1880118 was associated with the decreased expression level of RAC1 in Transformed fibroblast (GTEx V6, P = 1.9E-14, S Table 3).

Fig. 2
figure 2

Boxplots of the DAGLB expression levels classified by the genotypes of rs1880118 in subcutaneous adipose tissue (a) and whole blood (b) samples from GTEx project. The proportions explained (r2, square of the coefficient of determination) by SNP were calculated using summarized data from GTEx

As shown in Fig. 1a, the LD block of rs1880118 captured a region containing genes of DAGLB and RAC1. To identify the putative regulatory region, chromatin state markers including histone modifications and transcription factor binding site were annotated. An active transcription region (hg19, chr7: 6,486,750-6,488,250) near 5′ of DAGLB was uncovered by H3K27ac modification, transcription factor binding site, and DNase I sensitivity signals (Fig. 1b). Histone modification signals using adipose nuclei and primary mononuclear cells from peripheral blood (corresponding to the eQTL analyses) were annotated using data from the Roadmap project (Fig. 1c, d). We considered variants in high LD (r2> 0.6 in ASN, 1000 genome project phase I) with the proxy SNP rs1880118. Of the tagged variants meeting the LD threshold, three SNPs (rs4724806, rs3828944, and rs7807755, the LD correlation among these variants and the tagSNP rs1880118 is showed in S Fig. 3) were found to be located in this active region. The narrow peaks of the histone modification signals and their relative positions to the three variants are shown in S Table 4 and S Fig. 4. Two loci (rs4724806 and rs3828944) are located in the first intron of DAGLB, and another variant rs7807755 is located in the promoter region of DAGLB.

Allele-specific activities of the active fragment

To evaluate the transcriptional activity of the three candidate loci in the putative regulatory region, 1807-bp segments covering the major alleles (wild-type) and minor alleles (mutant) of three variants (rs4724806, rs3828944, and rs7807755, labeled as “A”, “B”, and “C”, respectively) were cloned into XholI and HindIII sites of pGL-3 basic vectors. Notably, the in silico PCR amplified fragment involving a microsatellite sequence of a poly-T structure including 11 bp repeats of thymine, while the reference sequence contains 40 bp repeats of thymine. In order to make the vectors comparable, we kept the numbers of thymines (11 bp repeats of thymine) in vectors and validated the numbers of thymines using Sanger sequencing.

The activities of the wild-type and the mutant vectors were measured using luciferase reporter assay. As depicted in Fig. 3, we observed that, compared to the wild-type (with major alleles), the mutant (with minor alleles) had significantly higher transcriptional activities (P < 0.001). These findings show that this segment contains three complete LD variants exhibited allele difference in transcription activity and suggested that it acted as a cis-regulatory element of DAGLB. This result is consistent with the eQTL signal in subcutaneous adipose tissue (P = 5.4E-42).

Fig. 3
figure 3

Segment contain the minor alleles of the tagged SNPs increased the transcriptional activity using luciferase reporter assay. The segments of the putative regulatory region containing the three candidate loci were inserted into the PGL-3 basic vector. Then the two vectors were transfected into the 293T cell line. The pRL-TK vector encoding the Renilla luciferase was transcribed, which was used as internal controls to normalize firefly luciferase expression. The error bars denote the SD of the relative activities. Comparisons were performed, and the P-values are labeled in the plot

Expression-trait association analysis

As shown in Fig. 4, replicable evidence has been established between the genotype (rs1880118) to the phenotype (HDL-C) and the genotype (rs1880118) to the expression (DAGLB). The relationship between the expression DAGLB and HDL-C levels remained unknown. We performed the analyses using three novel integrated approaches including TWAS, SMR, and Sherlock for the expression-trait association tests.

Fig. 4
figure 4

Integrated plot of three-way association between the tagSNP rs1880118, the mRNA expression levels of DAGLB, and the serum HDL-C levels. Evidence for each line is summarized and marked on the plot. P-values were determined using TagSNP and phenotype: genotype–phenotype linear regression and combined analyses using fixed-effect model; TagSNP and gene expression: eQTL analyses and luciferase reporter assay; gene expression and phenotype: two powerful approaches called “TWAS” and “Sherlock” based on eQTL and GWAS data

Using TWAS, we found that DAGLB levels were significantly associated with serum HDL-C levels (P = 3.0E-8) and TC levels (P = 2.0E-6). Based on SMR, we also observed a significant association between DAGLB and HDL-C (PSMR = 1.1E-4, PHEIDI = 0.13) and TC (PSMR = 2.5E-3, PHEIDI = 0.38). The method “Sherlock” showed that the expression level of DAGLB was associated with serum HDL-C level (logarithm of Bayesian factor = 5.62, P = 1.58E-6).

Discussion

In this study, we identified a novel locus at rs1880118 associated with serum HDL-C levels in Han Chinese. The genotype of rs1880118 explained 47.7% of the variance of the mRNA expression level of DAGLB in subcutaneous adipose tissue. Luciferase reporter assay indicated a causal segment contains three variants of DAGLB transcription regulatory.

As enrichment analysis has shown that trait-associated loci are more likely to be eQTLs [11]. That means transcriptional annotations can enhance discovery of trait-associated SNPs which will raise the possibility that some of the missing heritability would be found. Compared with classical GWAS strategy, eQTL-based screening avoids excessively rigid threshold and reliance on P-value of genotype-phenotype association in discovery stage. Using such strategy, a significant association between the eQTL at rs1880118 and serum HDL-C levels was observed and replicated in Han Chinese. After fine-mapping using epigenetic annotations, transcriptional activity was observed in a segment which contains three candidate causal variants. Reporter assay, Electrophoretic Mobility Shift Assay (EMSA) and ChIP could be performed to verify the regulatory mechanism in the further.

Although we replicated the genotype-phenotype association and found the causal segment, we still could not conclude whether this variant influences the HDL-C level by modulating the expression level of DAGLB because this variant may affect DAGLB and HDL-C independently. Powerful strategies were applied to identify eGene (eQTL gene, the gene that has a significant eQTL acting upon it) whose cis-regulated expression is associated with complex traits. TWAS and SMR have conceptual similarities with the Mendelian randomization (MR) test which aims to identify causal relations using genetic variations as instrument variables [36, 37]. Such strategies could benefit from the MR which will never result in reverse causality [39,40,41]. Also, only a small number of transcriptome data are required as reference for the analyses [41]. Then we verified the finding using a computational algorithm named Sherlock [38]. Consistent result supporting the association between DAGLB and Serum HDL-C levels was observed.

Similar to DAGLα, a sequence-related enzyme DAGLβ (coded by DAGLB) has a role in the biosynthesis of the endocannabinoid 2-arachidnoyl glycerol (2-AG) [42]. DAGLA is mainly expressed in the nervous system and seems to be the principal regulator of 2-AG [43]. DAGLB is widely expressed in multiple tissues, including adipose tissue, white blood cells, and brain (data from GTEx) [18]. DAGLB has been linked to hypotrichosis and the influence of early life stress [44, 45]. However, the evidence of DAGLB in lipid metabolism is limited. A recent study showed that DAGLβ inactivation lowers 2-AG, arachidonic acid, and eicosanoids in mouse peritoneal macrophages using DAGLB knock-out mice [46]. A corresponding reduction in lipopolysaccharide-induced TNF-α release was observed as well. These findings indicate that DAGLβ plays a role within the lipid network that regulates proinflammatory responses in macrophages [46]. However, it remains unknown whether the expression levels of DAGLB in multiple tissues is responsible for serum HDL-C and TC levels. Gene function studies using cell lines or mouse model with DAGLB knock in or knock down are needed for clarifying the biological role of the gene. In conclusion, we identified a novel serum HDL-C-associated variant rs1880118. The genotype of rs1880118 explained nearly half of the expression variance of DAGLB. After fine-mapping for the tagSNP, a regulatory segment which contributes to the variance of DAGLB was identified. The expression level of DAGLB was associated with serum HDL-C by means of integrated approaches using eQTL and GWAS data. The current study indicates the role of DAGLB and provides novel insight into the lipid mechanism in Han Chinese.