A Two-Stage Whole-Genome Gene Expression Association Study of Young-Onset Hypertension in Han Chinese Population of Taiwan

Hypertension is an important public health problem in the world. Since the intermediate position of the gene expression between genotype and phenotype makes it suitable to link genotype to phenotype, we carried out a two-stage whole-genome gene expression association study to find differentially expressed genes and pathways for hypertension. In the first stage, 126 cases and 149 controls were used to find out the differentially expressed genes. In the second stage, an independent set of samples (127 cases and 150 controls) was used to validate the results. Additionally, we conducted a gene set enrichment analysis (GSEA) to search for differentially affected pathways. A total of nine genes were implicated in the first stage (Bonferroni-corrected p-value < 0.05). Among these genes, ZRANB1, FAM110A, PREP, ANKRD9 and LAMB2 were also differentially expressed in an existing database of hypertensive mouse model (GSE19817). A total of 16 pathways were identified by the GSEA. ZRANB1 and six pathways identified are related to TNF-α. Three pathways are related to interleukin, one to metabolic syndrome, and one to Hedgehog signaling. Identification of these genes and pathways suggest the importance of 1. inflammation, 2. visceral fat metabolism, and 3. adipocytes and osteocytes homeostasis in hypertension mechanisms and complications.

Hypertension is an important public health problem in the world. Essential hypertension is one of the most predictive or associated risk factors of cerebral hemorrhage and infarction, coronary heart disease, heart failure, diabetes mellitus, and kidney disease. The substantial heritability (30-60%) 1-3 of hypertension has prompted scientists to study its genetic underpinnings through genetic and expression profiling. Findings on hypertension genes can be used not only for screening high-risk individuals, preventing disease development, but also for elucidating disease mechanisms. Most of the previous hypertension genetic study focused on hypertension with relatively older ages. We focused on young-onset hypertension which has a stronger genetic component than hypertension in general 4,5 . In this study, we followed our previous studies which used age 50 as the cut-off to screen young-onset hypertension 4,5 .
Gene expression level may represent the phenotype most immediately connected to DNA sequence variation. The intermediate position of the gene expression makes it suitable to bridge between genotype and phenotype 6 . Recent advances in molecular biology and technology have made it possible to monitor the expression levels of all genes simultaneously. Several gene expression association studies on human hypertension have been conducted [7][8][9][10] .
The mechanism of essential hypertension is complex. The target tissues of essential hypertension are likely multiple. Animal studies on hypertension gene expression have used the aorta, heart, liver and kidney tissues 11 , but these tissues are not accessible in human. Human lymphoblastoid cell lines (LCLs) are gaining popularity as a research tool for studying genome-wide individual differences 12,13 . LCL presents some advantages not only in its tractability and availability, but also in its potential of negating environmental influence. Lymphoblast cells are transformed and grown in the same conditions, which presumably minimize the environmental sources of variation 14,15 .
In this study, we conducted a two-stage whole-genome gene expressions association study. Afterwards, a hypertensive mouse data obtained from Gene Expression Omnibus (GEO) 16 database was used to further support 1 Study Design. Figure 1 shows the flowchart of the study design. A two-stage whole-genome gene expression association study was carried out in Han Chinese to search for multiple essential hypertension genes with differential gene expression in hypertensive patients and normotensive controls.
In the first stage, a total of 126 young-onset hypertensive patients and 149 normotensive controls were included. In the second stage, another 127 young-onset hypertensive patients and 150 normotensive controls were used to validate the finding in the first stage. Finally, we combined 253 case and 299 control samples together in the combination analysis. The characteristics of these samples are presented in the Tables 1 and 2.
Study participants. Young-onset hypertensive patients were selected from "Academia Sinica Multi-Center Young-Onset Hypertension Genetic Study", which had recruited 1023 non-aboriginal Taiwanese individuals with essential hypertension aged 20 to 50. Details regarding this young-onset hypertension genetic study has been described elsewhere 4,5 . Age and sex matched normotensive controls were selected from "Han-Chinese Cell and Genome Bank in Taiwan" 18 which has established by Institute of Biomedical Sciences of Academia Sinica. Inclusion criteria for young-onset hypertensive patients are described as follows: (a) systolic blood pressure (SBP) ≥140 mmHg and/or diastolic blood pressure (DBP) ≥ 90 mmHg at least twice in the previous 2-6 months, or SBP/DBP ≥ 120/80 mmHg at least twice for those who were on anti-hypertensive medication for two months or more; (b) participant initially diagnosed with hypertension between the ages of 20 and 51years; (c) no secondary hypertension such as chronic renal disease, renal arterial stenosis, primary aldosteronism, coarctation of the aorta, thyroid disorders, Cushing's syndrome and pheochromocytoma (confirmed through extensive clinical investigations including blood chemistry, renal function test, endocrine 6procedures and abdominal sonogram); (d) no medical history regard to severe disease, including liver and renal failure; carcinoma; cardiac or pulmonary failure (e) fasting glucose level < 126 mg/dl and no previous diagnosis of diabetes mellitus; (f) body mass index (BMI) < 35 Kg/m 2 ; (g) both parents and all grandparents are Han-Chinese; and (h) have been genotyped by Illumina 550 K beadchip.
Inclusion criteria for normotensive controls are: (a) participant with SBP < 120 mmHg and DBP < 80 mmHg and no anti-hypertensive medication (b) aged between 20 to 51 years; (c) no other disease (d) BMI < 35 Kg/m 2 ; (e) both parents and all grandparents are Han-Chinese; and (f) have been genotyped by Illumina 550 K beadchip. Finally, age and sex matched controls were selected to carry out the following experiments.
Establishing Lymphoblastoid Cell Lines. Lymphoblastoid Cell Lines (LCLs) were used to profile the gene expression levels. A total of 33 ml of blood had been drawn from each eligible subject: 17.5 ml for basic clinical chemistry and plasma storage, 5 ml for DNA extraction, 10 ml for LCLs establishment, and 0. 5 19 and R statistical software were employed for quality control of the data. The data quality control procedures including: (1) background correction (subtracting the average value of the negative control probes), (2) removal of 5,752 probes with detection p-value > 0.05, (3) quantile normalization for all 1,656 arrays, (4) removal of samples for which 3 replicated samples were not clustered for the remaining 25,216 probes, (5) median value of 3 replicates selected for further analysis, and (6) use of ComBat 20 , an empirical Bayes method (R package: SVA), to remove remaining batch effects.

Two-Stage Whole-Genome Gene Expression Association Study.
In the first stage (126 case and 149 controls), logistic regression with age, sex and BMI adjustment was used to identify the significantly differentially expressed genes with hypertension status (0, 1) as the independent variable. In the second stage, another independent sample set (127 cases and 150 controls) was used to validate the transcripts found in the first stage. The same statistical methods were used with age, sex, and BMI adjustment. Finally, in the combination analysis, we combined samples of the two stages together (253 cases and 299 controls) for the association test and calculated the fold change (average intensities in cases/average intensities in controls). The same analysis methods employed in the first and stages were used in the combination analysis. Bonferroni correction was employed in the first stage, second stage and combination analysis for handling multiple comparison issue. The commercial software SAS 9.4 was used to do the statistical analyses. Sensitivity Analysis. We used the combined samples to conduct a sensitivity analysis which included the age, sex, BMI, liver function (s-GOT & s-GPT), total-cholesterol (TC), HDL-C, kidney function (eGFR), serum uric acid, education, smoking and alcohol use in the model to check whether the results are robust to these factors.
Besides, we also performed a sensitivity analysis among the non-medicated patients to confirm that these transcripts were not due to effect of antihypertensive medications. Among the 253 young-onset hypertensive patients, 55 patients were non-medicated. We used these 55 non-medicated patients and 299 healthy controls to perform the sensitivity analysis.
Validation study using hypertensive mouse data. Furthermore, we used gene expression profile data of hypertensive mouse model downloaded from the Gene Expression Omnibus (GEO, GSE19817, http://www. ncbi.nlm.nih.gov/geo/GPL9734) 11 to validate these results.
Male BPH/2J (high blood pressure), BPL/1J (lowest blood pressure), and BPN/3J (normal blood pressure) mice were used to generate expression data. Gene expression profiles of tissues including aorta (9 BPH, 5 BPN and 9 BPL), liver (5 BPH, 5 BPN and 4 BPL), heart (5 BPH, 5 BPN and 5 BPL), and kidney (5 BPH, 5 BPN and 5 BPL) were measured. Merck/Affymetrix mouse 1.0 custom arrays monitoring 38,384 individual transcripts (25846 Entrez genes) were used. Raw intensity was normalized using the RMA algorithm 21 . BRB-Arraytools was used to analyze the normalized data with 10,000 permutations to identify the differential expressed ones among the identified genes. FDR was employed to control the multiple comparisons.
Gene Set Enrichment Analysis (GSEA). On top of the two-stage whole-genome gene expression association study, we further conducted a gene set enrichment analysis (GSEA) to investigate the differentially expressed pathways. BRB-array tools 19 was employed to perform the analysis. A total of 299 pathways from BioCarta database and 128 pathways from KEGG database were tested. For the GSEA, all of the 25,216 transcripts were included. Two-sample t-test was used to get the p-values from each single transcript. Least square (LS) permutation test and Kolmogorov-Smirnov (KS) permutation test were used to find significant gene sets. LS/KS permutation test finds gene sets which have more genes differentially expressed among the phenotype classes than expected by chance 22 . These results were further validated using hypertensive mouse data.
Expression quantitative trait loci (eQTL) Analysis. We integrated the gene expression profiles with the SNP and copy number variation (CNV) data to find influential genetic polymorphisms for the differentially expressed genes. Use was made of data from 250 (out of 299) normotensive controls, who had both genotype and gene expression data.
These samples were genotyped with the Illumina's Sentrix ® HumanHap550 Genotyping BeadChip which contains 560,184 tag-SNPs. Genomic DNA was isolated from leukocytes using a PUREGENE ® DNA Purification Kit (QIAGEN ® , Gentra Systems, Minneapolis, MN, USA) for genomic DNA isolation. The DNA concentration was quantified and adjusted to 60 ng/μl using a NanoDrop ™ ND-1000 Spectrophotometer (NanoDrop ™ Technologies, DE, USA). All sample were genotyped by deCODE Genetics (Reykjavík, Iceland). Genotype based (AA, AB, BB) regression was employed to test the association between genotype and gene expression. Only cis-SNPs (SNPs located in the gene region ± 50 Kbps) were used to test the association. We then tested the association between the discovered expression regulatory SNPs (eSNPs) and young-onset hypertension. Statistical significance was claimed under a significance threshold of p-value < 0.05.
Furthermore, the intensity data of 497,849 SNPs were used to call the CNV regions. CNVs were identified using PennCNV 23 and QuantiSNP 24 , respectively, which identifies CNVs by integrating intensity data from neighboring probes using a hidden Markov model (HMM). Gene-based CNVs association analysis was used to test whether there was a significantly differed probability of a CNV intersecting a given gene between hypertension patients and normotensive controls. CNV analysis were performed in PLINK 25 .

Results
Tables 1 and 2 lists the continuous and categorical characteristics traits, respectively, of the young-onset hypertension cases and normotensive controls. Overall, patients had higher BMI, uric acid (UA), total cholesterol (TC), eGFR, alcohol and cigarette use, education level and lower HDL-C level.
In the second stage, another independent sample consisting set of 127 cases and 150 normotensive controls were used to confirm the nine genes that discovered by first stage. All of these nine genes were significantly associated with hypertension in the second stage. According to the Table 3 and the heatmap ( Figure S2), except for ZRANB1 gene (fold change = 0.76; OR = 0.12), most genes were highly expressed in case group compared to the healthy control group. Although the fold change of these up-regulated genes are not very big (ranged from 1.20 to 1.51), the p-value and OR of these genes are strong (OR = 3.29 to 10.12), which means the variance of these genes' expression are small and the relative risk estimates are rather precise.
Moreover, we have conducted a sensitivity analysis using all 253 case and 299 controls, which included the liver function (GOT & GPT), total-cholesterol, HDL-C, eGFR, SUA, smoking, alcohol consumption, education as well as age, sex and BMI in the model. After adjustment of these covariates, these nine genes still significantly associated with young-onset hypertension (Table S1 in the Supplemental Materials).
We have also performed a sensitivity analysis among the non-medicated individuals to confirm that these transcripts were not due to effect of antihypertensive medications. The association between the genes and diseases status are still very significant, which demonstrate that our results were not affected by the antihypertensive medication (Table S2 in the supplemental materials).
Validation Study in a Hypertensive Mouse Model. Furthermore, we used the mouse gene expression profile data downloaded from GEO (GSE19817) 11 to validate these results. This data profiled gene expression in liver, heart, kidney, and aorta from the genetically hypertensive "blood pressure high" (BPH), normotensive "blood pressure normal" (BPN), and hypotensive "blood pressure low" (BPL) inbred mouse strains. Among these nine genes, Nrg2 cannot be found on this custom array and was not included in the following analyses.

Chr Gene
Probe ID    Table 4.
Gene Set Enrichment Analysis. In addition to the whole-genome gene expression association study, we also conducted a gene set enrichment analysis (GSEA) to investigate whether any known pathway maybe implicated. Table 5 showed the GSEA results of BioCarta database and KEGG database. Only the pathways that reached the significant threshold (p-value < 0.05) in the both LS permutation and KS permutation simultaneously were shown. There were 14 pathways identified from BioCarta, and 2 pathways identified from KEGG, respectively. Furthermore, we also used the mouse gene expression profile data to validate these results. Five out of 16 pathways can be validated in the mouse model (Table 6).

Data availability.
For future meta-analysis, our information on all p-values data are available for download from the website: http://pan.ibms.sinica.edu.tw/hypertension/data3.

Discussion
In this study, a total of nine genes were found differentially expressed in two independent sample sets. Among these genes, five genes (ZRANB1, FAM110A, PREP, ANKRD9 and LAMB2) were identified and replicated by our study, and also were validated in the hypertensive mouse model. NRG2, DCPS, WFDC12, and TPTE were identified and replicated by our study, but not replicated in the hypertensive mouse model. The discrepancy may be due to the different tissues used in our study (LCLs) and in the mouse model (aorta, heart, liver, and kidney) or due to the species specificity (human and mouse).
The down-regulated gene, ZRANB1, is playing a role in the regulation of cell morphology and cytoskeletal organization and being required in the stress fiber dynamics and cell migration. This gene may also modulate TNF-alpha signaling which has been associated with hypertension 26 . NRG2 encodes a novel member of the neuregulin family of growth and differentiation factors 27 . Through interaction with the ErbB family of receptors, this protein induces the growth and differentiation of epithelial, neuronal, glial, and other types of cells. ErbB family members are implicated in the development of end organ damage, as occurs in hypertension 28 . The mechanism of FAM110A, PREP, ANKRD9, DCPS, WFDC12, TPTE, and LAMB2 genes on the occurrence and development of hypertension is not yet clear. Among these nine genes, except for NRG2, the other genes are not included in any known pathway.
We have used the combined samples to conduct a sensitivity analysis, which adjusted for the all covariates, to check whether the results are robust to these factors. After adjustment of these covariates, these nine genes still significantly associated with hypertension (Table S1 in the supplemental materials). We noted that there are differences in ORs from the main findings for some genes. According to Tables 1 and 2, besides the metabolic related traits, the distributions of alcohol consumption, smoking and education are also strikingly different between cases and controls. Since these three covariates all have genetic determinants, some of the genes we found might be related to alcohol consumption, smoking, or education. Therefore, we used the combined sample to check the associations between these three covariates and nine genes and found that ANKRD9 and NRG2 is slightly associated with education and smoking (Table S6 in the supplemental materials). The differences in the distributions of alcohol consumption, smoking, and education between cases and controls may be partly due to the effects of these genes. More study is needed to examine the relationships among these genes, covariates and hypertension.

Chr Gene
Tissue p-value FDR Mean of intensities in class 1

Mean of intensities in class 2
Mean of intensities in class 3 Moreover, we have also performed a sensitivity analysis among the non-medicated individuals to confirm that these transcripts were not due to effect of antihypertensive medications. The association between the genes and diseases status are still very significant, which demonstrate that our results were not affected by the antihypertensive medication (Table S2 in the Supplemental Materials). However, we observed that there are also bigger differences in some ORs of genes (eg. NRG2, FAM110A, LAMB2 and TPTE) when the cases are restricted to individuals not on antihypertensive therapy. It may be due to the different sample size or the different disease stage. These non-medicated patients may have relatively less severe hypertension. These genes might have the different effects contribute to the different hypertension stages. Further study is needed to understand the pathogenesis of hypertension.
In addition to the whole-genome gene expression association study that tested these transcripts one at a time, we performed a gene set enrichment analysis (GSEA) to identify pathways with significantly greater number of genes been down-or up-regulated, compared to the controls. In the GSEA, 16 pathways from BioCarta or KEGG were identified.
Among these pathways, six pathways (pathway No. 1 to 5 and 13 in the Table 5), are all related to the TNF (Tumor necrosis factors) which has been implicated in the development of salt-sensitive hypertension induced by angiotensin II 29 . ZRANB1, which identified in the two-stage association study, is also related to the TNF-alpha, but this gene is not involved in any known pathway as yet. Pathway Bioactive peptide induced signaling pathway (No. 8 in the Table 5) has already known to relate to the blood pressure regulating peptide angiotensin 30 . Three pathways (No. 9 to 11 in the Table 5) are implicated with the interleukin. Hypertension patients have been shown to have an altered profile of these pro-and anti-inflammatory cytokines 31 . Pathway No. 12 (Visceral fat deposits and the metabolic syndrome) is a pathway related to the glucocorticoid receptor, which will activate/inactivate the lipoprotein lipase, TNF-alpha, and insulin resistance 32 . The association between Insulin resistance and hypertension has been well recognized. Up to 80% patients with type 2 diabetes have hypertension 33 . Pathway hedgehog signaling pathway (No.16 in the Table 5) is an active pathway during embryogenesis. three hedgehog (HH) gene homologs were discovered in vertebrates; Desert (DHH), Indian (IHH), and Sonic (SHH). SHH is the most widely expressed in adult tissues. The hedgehog signaling pathway plays a crucial role in the angiogenesis and vascular remodeling. Hypoxia has been demonstrated to activate the SHH pathway to enhances the progress of vascular remodeling in a number of human diseases, including atherosclerosis and pulmonary artery hypertension 34 . Besides, recent studies have shown that this pathway is linked with some age-related diseases such as metabolic syndrome 35 which may also relate to the blood pressure.
Pathway data in the KEGG and in the BioCarta are heterogeneous due to the differences in pathway construction purpose and methods 36 . For the same pathways in the KEGG and the BioCarta, not only numbers of genes involved are different, but sometimes with very little overlap. That is why the results of gene set analysis are so different, employing the information provided by the two pathway datasets.
In addition, we have also tried to integrate the single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) data and gene expression profile data to investigate whether these hypertension related genes' expression levels were regulated by SNPs or CNVs.
We found some expression regulatory SNPs (eSNPs) that are associated with the gene expression level of the NRG2 (rs2916092) and PREP (rs1051484, rs1078726, rs10871983, rs1149305, rs1149309, rs1149313, rs11758609, rs1190050, and rs1190053) ( Table S3 in the supplemental materials), but these eSNPs were not associated with the  young-onset hypertension in our previous two-stage GWAS 5 (Table S4 in the supplemental materials). The lack of association between these eSNPs and hypertension may be due to the complicated mechanisms of hypertension and small effects of individual genes. A larger sample size or alternative approaches are needed to show the connection between eSNPs and hypertension. Beside the eSNPs, there are no CNVs associated with these genes' expression levels ( Table S5 in the supplemental material).
The mechanisms contributing to essential hypertension are complex. The target tissues of essential hypertension are likely multiple 11 , but these tissues are not accessible in human. In this study, LCLs were used in all gene expression association tests. The magnitude of expression profiles shared among the different tissues is still under debate. Current estimates are range from very small 37 to 70-80% 38 . The MuTHER study which has the relative large sample size has demonstrated that around 30% of gene expression profiles are shared among tissues, while 29% are exclusively tissue-specific 12,39 . Therefore, some tissue-specific signals may have been missed, but the findings should be valid.
Several gene expression association studies on human hypertension have been conducted [7][8][9][10] . There are no overlapping genes were founded among our single gene analysis and previous studies. It may be due to the different sample size and the different surrogate tissues used. The sample size of the most gene expression studies on hypertension is small (n = 18~20), except a large scale gene expression integrative network analysis which used 3,679 non-medicated individuals in the Framingham Heart Study. Although the identified genes are not overlapping among these studies, all of these studies have identified the genes that related to immune or inflammation. Moreover, in our pathway analysis, we have identified the IL-10 Anti-inflammatory Signaling Pathway which is in line with the findings on IL-10R gene by Chon's hypertension gene expression study 8 .
Our study identified several previously unknown young-onset hypertension genes and pathways in Han Chinese. Identification of these genes and pathways suggest the importance of 1. inflammation, 2. visceral fat metabolism and 3. adipocytes and osteocytes homeostasis in either hypertension etiology or complications. These finding may broaden our understanding of hypertension etiology and major outcome development.  Table 6. Validation results of pathway analysis based on mouse data. LS permutation: Least square permutation; KS permutation: Kolmogorov-Smirnov permutation. The LS/KS permutation tests, which find gene sets that have more differentially expressed genes among the classes than expected by chance.