Introduction

Elevated blood pressure (BP), or hypertension (HTN), through systolic (SBP) and/or diastolic (DBP) BP, is the most common cardiovascular condition across the global population, and the major risk factor for stroke, cardiovascular disease (CVD), renal failure, and congestive heart failure [1, 2]. Susceptibility to HTN varies considerably by age, sex, body mass index (BMI), ancestry, sodium intake, endothelial and vascular changes, stress, alcohol consumption, to name a few. However, the biological routes to subsequent disease through these risk factors remain obscure [3]. Moreover, there are significant population differences in HTN prevalence and risk, as exemplified by African Americans displaying the highest prevalence, and consequently, experiencing greater risk of CVD and end-stage renal disease as compared to their European Americans counterparts [1, 4]. BP is also a classic quantitative trait with an estimated heritability from family data that ranges from 30 to 40% [5]. There is currently optimism that identification of BP genes and their pathways will illuminate HTN and target organ damage causes, irrespective of whether these pathways are perturbed by genetic, lifestyle and/or environmental factors [6, 7]. Moreover, these BP pathways are expected to enable the targeted development of therapeutics.

Recent well-powered multi-ancestry GWAS or gene-centric studies have shed considerable insight into the underlying genetic basis of BP by identify several susceptibility loci [8,9,10,11]. However, the variance explained by the detected loci is less than 3.5% [8, 9, 11] of inter-individual variation in BP, supporting the hypothesis that rare and low frequency variants may contribute to BP variation. Multiple statistical methods for testing rare variant association have been developed for population-based samples [12,13,14,15,16] and family data [17,18,19,20]. It has long been known that family-based designs have an advantage in detecting rare variants [17, 19]. In the past two decades, before GWAS became popular, numerous large-scale genome-wide linkage studies on BP and HTN have been reported, including the well-designed studies by the NHLBI-supported Family Blood Pressure Program (FBPP) and the UK Medical Research Council British Genetics of Hypertension (BRIGHT) study [5, 21,22,23]. However, the linkage findings were few, often not having appropriate replication samples, and the observed linked loci did not, for the most part, overlap the subsequent GWAS-derived association loci. This has led to the view that while common variants are responsible for GWAS findings, the findings of linkage mostly arose from rare and low frequency variants. We sought to test this hypothesis directly using data on SBP and DBP variation.

Materials and methods

Our analysis flow chart is presented in Fig. 1. We first used primary exome array data on 1802 African American families from the Family Blood Pressure Program (FBPP) for linkage analysis (described in Supplementary Methods). The BP outcomes include the residues of SBP, DBP, and pulse pressure (PP), imputed for anti-hypertention medication, and adjusted for age and BMI. Population structure and family structure were modeled by the first ten principal components (PC) of genome and the kinship matrix (Supplementary Methods). Linkage analysis was performed on 20,791 Single Nucleotide Polymorphisms (SNPs) (minor allele frequency > 0.2 and linkage disequilibrium r2 < 0.1) using Merlin [24]. Significant linkage evidence was defined as LOD > 3 corresponding to a principled posterior probability of linkage of 95 percent [25, 26].

Fig. 1
figure 1

Analysis flow chart

In order to testify that our observed linkage evidence was driven by multiple low frequency/rare variants but not common variats, we performed a serious of diagnoses and simulations including: (1) estimated the heritability attributable to common variants using 6194 unrelated African Americans from the Candidate Gene Association Resource (CARe) consortium (Supplementary Methods); (2) estimated the regional heritability attributable to common variants in every 40 cM under the linkage peak and calculated its correlation with LOD score in the same region; and (3) simulated a quantitative trait influenced by single common variant (minor allele frequency [MAF] = 0.1 or 0.4 and accounting for 1% phenotypic variation) under the FBPP pedigree structure and perform linkage analysis using Merlin [24] for 100 times (Supplementary Methods).

We next examined all SNPs in the candidate linkage interval specified with boundary LOD > 1 and harboring the linkage peak. Risk variants were selected in three steps. In step 1, we performed family-based association analysis using a linear mixed model using ASSOC in S.A.G.E [27]. SNPs with association evidence were selected with either P ≤ 0.1 (marginal significance) or absolute regression coefficient beta ≥ 5 (large allelic effect). In step 2, we filtered the SNPs from step 1 for linkage contribution by presenting at least twice in a family with family-specific LOD ≥ 0.1. In step 3, we excluded intergenic and intronic SNPs, and SNPs in complete LD. We then defined a risk score for those risk SNPs as \(r_i = x_i^T\beta\), where β is the estimated regression coefficients of SNPs, and \(x_i^T\)is a (transposed) vector of the number of risk alleles carried by individual i. Linkage analysis was further performed conditional on this risk score.

Finally, we performed a family-based SKAT-O[28] test to investigate the cumulative effect of risk variants in FBPP. Variants were pooled together and grouped into non-overlapping windows in SKAT-O. Genetic pathway enrichment analysis of genes harboring risk variants was performed using the software package STRING [29], which implements a prediction pipeline for inferring protein-protein interactions (PPI). We reported gene sets that were either significantly co-expressed or involved in experimentally validated PPI with a medium score or higher (score ≥ 0.4). The enriched pathways and their involved genes were followed-up to group variants for SKAT-O test. SNP-set replication analysis was performed using 16,968 individuals with African ancestries collected from eight independent cohorts: the African American Diabetes Mellitus Study (AADM), the Atherosclerotic Risk in Communities Study (ARIC), the Mt. Sinai BioMe Biobank (BioMe), the Vanderbilt University Medical Center Biobank (BioVU), the Health and Retirement Study (HRS), the Howard University Family Study (HUFS), the Multi-Ethnic Study of Atherosclerosis (MESA), and the southwest Nigeria cohort collected by Loyola University at Chicago (Nigeria) (Supplementary Methods). Replication analysis was conducted in individual cohorts and meta-analyzed using Fisher’s method.

Results

Supplementary Table 1 presents the characteristics of the FBPP samples included. The residuals of SBP, DBP, and PP are all approximately normally distributed and required no further transformations (Supplementary Fig. 1). Linkage analysis demonstrated three significant linkage peaks with LOD > 3 on chromosomes 1, 17, and 19, all observed with PP. The largest linkage peak was at chromosome 1q31 (LOD = 3.8). Additional linkage evidence (LOD > 2) was observed on chromosomes 5, 19, and 20 for all three BP traits (Table 1, Supplementary Fig. 2).

Table 1 Summary of all maximum LOD scores exceeding 2 (max LOD), their corresponding locations on the genetic map (in centiMorgans, cM) and the BP trait involved (SBP, DBP, PP) using FBPP data

We followed-up on the largest linkage peak at 1q31, which reached genome-wide statistical significance and overlaps with a linkage region already implicated in prior BP studies in the FBPP European ancestry families, and in other studies [30,31,32,33]. However, existing BP GWAS loci do not overlap with this region. We estimated the heritability attributable to common variants in every 40 cM across the genome in unrelated samples from CARe. The correlations between LOD scores and estimated regional heritabilities were consistently low and negative for all three BP traits (ρ = −0.021,−0.098 and −0.211 in SBP, DBP and PP; Supplementary Fig. 3). In particular, the estimated heritability at 1q31 was not statistically significant (h2 = 0.01% ± 0.44%) for PP. Our analysis consistently suggests that common variants do not account for observed linkage evidence for BP, including at 1q31 (Supplementary Table 2). Additionally, the heritability estimates are essentially unchanged for all BP traits on chromosome 1 with or without including the SNPs in the linkage region. Futhremore, we used simulations of a single common variant (Materials and Methods) to demonstrate that the observed linkage evidence could not be due to common variants. Among the 100 simulations, the maximum LOD scores were 2.31 and 2.28 for MAF 0.1 and 0.4, respectively, suggesting the common variants cannot explain the linkage evidence observed on 1q31.

Informed by the linkage evidence, we focus on PP trait in the following analysis. To search for specific variants that can explain the linkage evidence, we considered the region with boundaries LOD > 1 harboring the linkage peak (171.1–203.9 Mb; Fig. 2), a region that contained 2533 variants on the exome array. A total of 81 exonic SNPs were left after filtering by association evidence and linkage contribution (Materials and Methods; Supplementary Table 3). Linkage analysis conditional on the risk score defined by these 81 SNPs did not change the linkage peak location but reduced the LOD score from 3.8 to 1.2 (Fig. 2), strongly suggesting that these variants do explain a majority of the original PP linkage evidence.

Fig. 2
figure 2

Significant evidence for linkage of PP to human chromosome 1 from the FBPP study. The linkage evidence is plotted conditional on, or not, with respect to the risk score calculated by the 81 variants within the region. The 81 variants within 54 genes are also shown

We next performed family-based SKAT-O [28] tests for the pooled 81 exonic variants in FBPP samples. The observed P value is small for PP, as well as for SBP and DBP, which is not surprising because of selected SNPs (Table 2). We then performed SKAT-O analysis for PP with the same 81 SNPs in eight replication samples separately and the results then meta-analyzed (Table 2). We observed a marginally significant association for PP (P = 0.0509) and the corresponding association evidence of SBP, DBP was marginal (P = 6.41 × 10−2 and 9.11 × 10−2, respectively).

Table 2 P values of rare variant SKAT-O tests of 81 chromosome 1 SNPs in the discovery sample (FBPP) and in eight replication cohorts

The 81 variants within the linkage interval are spread out across numerous genes and, so for genetic specificity, we divided the variants into 20 non-overlapping windows of four variants each (the last window has five variants) (Supplementary Table 3). SKAT-O tests were performed for each window (Supplementary Table 4), and the most significant window for PP consists of four variants (rs148482637:C>T, rs35946265:T>C, rs77277070:T>C, and rs112997656:G>A) on three genes: ASTN1 (MIM:600904), SEC16B (MIM:612855) and C1orf49 (MIM:611154), with P value 2.43 × 10−4 in FBPP, and this window was significant in the meta-analysis of the replication cohorts (P = 4.16 × 10−2, Table 3). The corresponding results for SBP and DBP were also presented in Table 3.

Table 3 Association of the most significant 4-SNP window results, containing the genes ASTN1, SEC16B, and C1orf49, in the discovery sample (FBPP) and in eight replication cohorts

A second approach to assess the veracity of the linkage evidence is to assess the functions encoded by the 54 genes with the 81 variants. The overall protein-protein interaction (PPI) enrichment P value for the 54 genes is 1.27 × 10−7. We identified one GO biological process, one GO molecular function and one Kegg pathway that were significantly enriched (Table 4). The GO biological pathways “termination of G-protein coupled receptor signaling” (GO:0038032) involves four regulators of G-Protein Signaling: RGSL1 (MIM:611012), RGS21 (MIM:612407), RGS1 (MIM:600323), RGS13 (MIM:607190). The four variants in these four genes (rs7535533:A>G, rs77664911:C>T, rs144803737:T>A, and rs143481310:C>T) are associated with with PP in FBPP (SKAT-O P = 0.018) but could not be replicated (Table 4, P = 0.659). The GO molecular functions “N,N-dimethylaniline monooxygenase activity” (GO:0004499) involves flavin-containing monooxygenases: FMO1 (MIM:136130), FMO2 (MIM:603955), and FMO3 (MIM:136132). The four variants in these three genes (rs2266782:G>A, rs200985584:G>A, rs2020866:C>T, and rs149030329:G>A) are associated with PP in FBPP (SKAT-O P = 0.026) and the association is significant in the replication cohorts (P = 0.027). The extracellular matrix (ECM) receptor interaction (hsa4512) enriched in KEGG includes four genes: TNN, TNR (MIM:601995), LAMC1 (MIM:150290), LAMC2 (MIM:150292). The six variants in these four genes (rs143192203:G>T, rs859427:T>C, rs148657090:A>T, rs141408335:C>T, rs143817389:A>G, and rs35281117:T>A) are associated with PP in FBPP (SKAT-O P = 0.022) but fails to replicate (P = 0.459). Again we also reported the corresponding association evidence for SBP and DBP in Table 4.

Table 4 STRING analysis of pathways within the 58 genes and their associations with BP. All the numbers refer P values

Gene expression analyses of multiple tissues [34] on genes of interests in the most significant window and significant pathways are shown in Supplementary Figs. 47 by looking at the GTEx database.

Discussion

As GWAS have become a popular approach to detect genetic variants underlying complex traits, linkage analysis has become less popular, even ignored, since it is considered to be of low statistical power. However, the genetic variants identified by BP GWAS do not always localize to the genomic regions identified in previous linkage studies [21,22,23, 35,36,37,38], suggesting that the latter might provide novel clues to BP genetic susceptibility. A possible explanation for these discrepant findings is that GWAS are targeting common variants but have little power when multiple rare variants in a region (gene) contribute to trait variation, as can be detected with family-based linkage studies [17]. Thus, a search for multiple rare or low frequency BP variants in linkage regions may be fruitful. The success rate of this general strategy is, however, unknown.

In this study, we assessed such a BP linkage region using African American families from the FBPP study and who have been genotyped using the exome array, a platform specifically designed for examining functional variants with variants enriched for many cardiovascular traits including BP. We identified seven genomic regions linked to BP traits (Table 1) with LOD score > 2 and three with significant linkage signals (LOD > 3). Interestingly, the most significant evidence came from a locus for PP on human chromosome 1q31 (LOD score = 3.8), previously observed for SBP and DBP in European Americans [30, 32, 33, 39] but not enriched for BP GWAS signals [35,36,37,38, 40, 41]. Since this locus has shown significant linkage evidence in multiple studies of multiple ancestries it remains as a credible BP locus whose genetics need to be dissected. The heritability analysis in 6194 unrelated African American subjects also concluded that common variants in this region are unlikely to account for BP variation (Supplementary Table 2). Our computer simulations corroborated that even a common variant accounting for 1% phenotypic variation is unable to produce the observed linkage evidence (Materials and Methods), suggesting that multiple rare and low frequency variants are likely to be the causes of the linkage evidence. Our association analysis was focus on PP because of the observed linkage evidence in the chromosome 1 region and identified 81 exonic variants (Supplementary Table 3) that explained most of the observed PP linkage evidence (Fig. 1). Not surprisingly, most of these 81 variants are low frequency or rare functional variants. Nevertheless, we were able to replicate the association evidence for these 81 variants in the meta-analysis of eight additional African ancestry cohorts with marginally significance (P = 0.0509, Table 2). Because we tested the 81 variants together in replication cohorts, there was no need to adjust for multiple comparisons. Using the genetic score defined by the 81 variants, we observed a substantial drop of LOD score (from 3.8 to 1.2), indicating that the selected variants explained substantial linkage evidence on chromosome 1. This study included the largest exome-array replication analysis in African Americans. Although the replication sample size (N = 16,968) was still much smaller than the large exome-array study of adult height (N = 59,804) [42], we observed the same degree of replication evidence (P values were between 0.05 and 0.01 without adjusting for multiple tests). Our and other’s studies clearly suggest that the replication analyses for rare variant associations are facing a big challenge.

The main challenge is to identify which of the 54 genes, and their corresponding variants, within the linkage interval are responsible for inter-individual PP variation. Because the 81 variants are sparsely located in 54 genes, we performed non-overlapping window SKAT-O analysis to reduce the number of comparisons although a sliding window analysis had essentially the same result. We demonstrated that one interval of four variants with the genes ASTN1, SEC16B, and C1orf49 is the most significantly associated in our discovery with PP and is also replicated (Table 3). However further data on new samples are needed to be assured of their significance. These three genes harbor four non-synonymous variants and demonstrate a consistent protective effect on PP in the FBPP. ASTN1 and its two upstream genes RFWD2 and PAPPA2 have been shown to be strongly expressed in the renal outer medulla and show protective effects on HTN in Dahl S rats [43, 44], consistent with what we observed with humans in this study. This effect may be mediated through pathways that affect BP via renal transcellular Na and K electrochemical gradients and tubular Na transport, mitochondrial TCA cycle and cell energetics, and circadian rhythms [43]. For humans, we used the GTEx database [34] to search for the expression domains of the four genes: ASTN1 is mainly expressed in the brain; and, SEC16B is highly expressed in the kidney and liver (Supplementary Fig. 4). Therefore, these four genes are highly relevant candidate BP genes. However, functional tests are needed to confirm these findings.

To assess function, we also examined protein-protein interaction data for the 54 genes using STRING [29] and observed significant enrichment in two GO and one Kegg pathways (Table 4). The GO molecular function pathway “N,N-dimethylaniline monooxygenase activity” (GO:004499) involves flavin-containing monooxygenases (FMOs) genes: FMO1, FMO2 and FMO3 (Table 4). The four exonic variants in these genes are associated with PP in both FBPP and replication cohorts (Table 4). FMO3 is involved in the formation of trimethylamine (TMA) N-oxides (TMAO) [45]. Organic osmolytes (i.e., TMAO) can counteract an increase in osmotic pressure and peripheral resistance, and hypertension can develop when there are no organic osmolytes [46]. Individuals with deficient FMO3 activity have a higher prevalence of HTN and other CVD, because of a decrease in formation of TMAO to counterbalance the effects of a higher osmotic pressure and peripheral resistance [47]. FMO1 is highly expressed in kidney, FMO2 is highly expressed in lung, aorta and adipose and FMO3 is highly expressed in liver and adipose (Supplementary Fig. 5). Therefore, the FMO family may regulate BP through these pathways but the specific functional deficits remain unknown.

These results, taken together, support the hypothesis that multiple rare and low frequency variants may underlie linkage peaks. If so, these regions represent new opportunities for gene discovery. This hypothesis immediately suggests many corollaries that can be tested: (1) rare functional variants can also have small-to-modest genetic effects and behave in a non-Mendelian manner, (2) the clustering of variants within the linkage region can occur from diverse genetic effects at a single gene (intragenic allelic heterogeneity) or from the effects of multiple variants in different genes (intergenic allelic heterogeneity), and (3) when different genes are implicated, as in here, they do not necessarily have to have the same function but need to perturb the trait through different pathways. These predictions depend on how gene function is distributed in the human genome. In contrast, it is surprising that the allelic effects are small to modest as for GWAS rather than being large as for coding variants in rare disease. This aspect may simply reflect the biology of a synthetic trait like BP that represents the effects of many genes and numerous independent pathways, but none with overwhelming importance. Perhaps, major physiological traits under homeostatic control have complex distributed regulation so that there is no single point of vulnerability. Consistent with this evidence, to date, variants in Mendelian HTN syndromes, do not explain the variance of BP in the general population.

Although linkage studies are often considered to have low statistical power, many disease variants have been identified through linkage analysis, such as the BRCA1 and BRCA2 genes for breast cancer and RBFOX1 for BP [48, 49]. We used LOD score > 3, the initial proposal by Morton [25], as our genome-wide significance level corresponding to a principled posterior probability of linkage of 95 percent (Ott 1991 [26], p.66). Our linkage peak on chromosome 1 with LOD = 3.8 is also well above the threshold of 3.3 suggested by Lander and Kruglyak [50]. The linkage evidence at this region has been replicated multiple times in independent studies [30, 32, 33, 39]. We further calculated statistical power. The power to achieve a LOD score of 3.0 is 49.5% for a multiple point linkage analysis assuming a model with 5% quantitative trait loci (QTL) additive variance, 60% residual shared variance, no recombination between marker and the QTL, and a sample size of 4394 with sibship size 3 (the average sibship size of FBPP). However, the FBPP study was designed to enrich for families with HTN genes by over-sampling high BP family members such that our power could be underestimated.

Linkage analysis is not affected by allelic heterogeneity but association analysis is. Thus a reasonable hypothesis is that the observed significant linkage evidence on chromosome 1 is driven by multiple low frequency or rare variants, which is the focus of our study. In addition, we calculated the BP variation explained by common variants across the genome using the summary statistics from the large BP GWAS study in ICBP [36]. The estimated variation explained by common variants is between 11.62 and 12.13%, which is much lower than the heritability estimated from family data (~35% [5]). Thus, our hypothesis that the linkage peak is driven by multiple low frequency or rare variants is reasonable.

In conclusion, our linkage analysis identified a genome wide significant region on 1q31 harboring PP variants. Follow-up association revealed multiple exonic variants in this region contributing the observed linkage evidence and the association evidence was replicable in independent cohorts. Association analysis and pathway analysis further identified multiple genes, ASTN1, SEC16B, C1orf49, FMO1, FMO2, and FMO3 are associated with PP. Beyond producing new insights into BP, we demonstrate how family-based linkage and association studies can implicate specific rare and low frequency variants for complex traits.