Introduction

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex diseases at the genome-wide significance level (p < 5 × 10–8). Most of the variants identified by GWAS are common variants with minor allele frequency (MAF) ≥ 1%, and most of these variants display modest effect sizes and can only explain a small portion of the total heritability of complex diseases. Yet, rare variants (MAF < 1%) are important to uncovering unexplained heritability and discovering novel genes contributing to complex diseases [1,2,3]. Because standard association approaches testing each variant individually are grossly underpowered for rare variants, aggregation unit-based methods that jointly analyze variants have been proposed to improve power to detect rare variant associations. Aggregation unit-based approaches include, among others, the sequence kernel association test (SKAT) [4], Burden tests [5,6,7], SKAT-O [8], and aggregated Cauchy association test (ACAT) [9]. However, power of these methods to identify disease regions can be limited by insufficient number of cases in unascertained cohorts.

In genetic association studies, family history (FH) of disease in relatives is often collected in large population cohorts. FH provides an overview of a phenotype within families. Such information typically includes phenotypes of un-genotyped parents or more distant relatives of probands. FH is related to the genotypes of probands at disease loci based on the Mendelian laws of transmission, and is important in assessing health problems and risk of diseases [10,11,12]. While collecting cases is expensive, incorporating FH information into standard case–control genetic association analyses is a cost-effective way to potentially increase statistical power [11, 13,14,15]. Many study designs have limitations for genetic research of late-onset diseases such as Alzheimer’s disease (AD), because disease cases may be deceased with unavailable genotype data. The standard statistical association tests in younger cohorts with low prevalence of some late-onset diseases are not powerful to identify genetic regions associated with a trait of interest. In contrast, the incorporation of available information of disease status in the form of FH may increase the sample size in cohorts with limited cases or individuals with unavailable genotypes. Genetic association studies using only cases and controls will greatly benefit by incorporating available FH information to detect associations.

FH cannot be directly incorporated in standard genetic association methods, limiting its use in genetic association testing. FH has been included as a covariate to improve disease prediction [16], or used to infer mode of inheritance to construct statistical tests [17]. However, there are a few reported methods that allow FH to be exploited in genetic association analysis to improve statistical power to detect disease loci. The method developed by Ghosh et al. [13] enables the incorporation of FH as a phenotype into the standard single variant analysis, and the results confirmed that exploiting the information contained in FH substantially boosts power to detect the individual variant at disease loci. Nevertheless, these single variant tests suffer from loss of power to detect rare variant associations. While numerous aggregation unit-based methods to jointly analyze rare variants have been proposed to improve power to detect rare variant associations, aggregation unit-based methods that can directly incorporate FH information are needed.

We developed a new and powerful method of family history aggregation unit-based test (FHAT) that enables the incorporation of FH to enhance the statistical power for rare variant associations. We also developed an optimal unified test FHAT-O to maintain robust power in complex scenarios regardless of directions of genetic effects or the proportion of causal variants. To make the comparison with the recent developed method, liability threshold model of case–control status and FH (LT-FH) [11], we proposed a novel way to utilize LT-FH into aggregation unit-based method for rare variant analysis. We performed an extensive simulation study to evaluate the type I error and power of FHAT and FHAT-O under various scenarios, and illustrated the methods using whole exome sequencing data from the UK Biobank.

Material and methods

Family history aggregation unit-based test (FHAT)

We propose a novel approach, FHAT, to incorporate FH information in the aggregation unit-based tests. We assume that there are n probands with m observed variants included in the aggregation unit-based test. When we have FH on the relative of the probands, let \(Y_i^P\) denotes the phenotype of the ith proband; \(Y_i^R\) denotes the phenotype of the relative of the ith proband, respectively; \(G_i^P\) denotes the genotypes of the ith proband; \(X_i^P\) denotes covariates for the ith proband; \(X_i^R\) denotes covariates of the relative of the ith proband, such as age and ancestral principal components (PCs) that account for population structure. The probability of observing (\(Y_i^P\),\(Y_i^R\)) conditional on \(G_i^P\) can be written as follows (see details in the Supplementary Method):

$$P\left( {Y_i^P,Y_i^R|G_i^P,X_i^P,X_i^R} \right) = P\left( {Y_i^P|G_i^P,X_i^P} \right)P\left( {Y_i^R|G_i^P,Y_i^P,X_i^R} \right)$$
(1)

Therefore, the evidence for association can be assessed from two separate analyses for probands and relatives. We assume an additive model and coding the genotypes in \(G_i^P\) as the number of minor alleles. One can also use dominant or recessive models by coding the variants appropriately. Based on \(P\left( {Y_i^P|G_i^P} \right)\), we first assess the association between probands’ genotypes and their disease status using

$$g\left( {E\left( {Y_i^P|G_i^P,X_i^P} \right)} \right) = X_i^P\alpha _P + G_i^P\beta _P$$
(2)

where g(∙) is the link function, αP is a vector of regression coefficients for covariate effects, βP is a vector of regression coefficients for the observed genotypes in probands. The model for relatives based on \(P\left( {Y_i^R|G_i^P,Y_i^P} \right)\) is specified as follows:

$$g\left( {E\left( {Y_i^R|G_i^P,Y_i^P,X_i^R} \right)} \right) = X_i^R\alpha _R + G_i^P\beta _R + Y_i^P\lambda _R$$
(3)

where λR is scalar of regression coefficients for probands’ phenotypes for the relatives’ model; αR is vector of regression coefficients for relatives’ covariates; βR is vector of regression coefficients for m observed variants in probands. Inclusion of \(Y_i^P\) is necessary to make the analyses of probands and relatives independent based on (1). This relatives’ model (3) can analyze FH from unrelated relatives, i.e., single relative per probands or FH from both parents since mothers and fathers are conditional independent. We observe that the two underlying association estimators, (\(\hat \beta _P,\hat \beta _R\)), have the relationship [18] of \(\hat \beta _R \approx 2\Omega \hat \beta _P\) where Ω is the kinship coefficient between probands and their relatives and \({{\Omega }} = \frac{1}{4}\) for first-degree relatives such as parents.

Conventional aggregation unit-based methods evaluate the association between a set of variants and phenotype among probands. One such aggregation unit-based method is called the SKAT [4]. The weighted score statistic based on the probands’ model (2) is

$$Q_{{{{{{\rm{SKAT}}}}}}} = \frac{{\left( {Y^P - \hat \mu _P} \right)^TG^PWWG^{P^T}\left( {Y^P - \hat \mu _P} \right)}}{{\hat \phi _P^2}}$$

where W = diag(w1,w2,…,wm) is a pre-specified weight matrix for m variants; GP is a n × m genotype matrix with (i, j)th element corresponding to the additively coded genotype for variant j of proband i; \(\hat \mu _P\) is the estimated mean of YP using the null model with only covariates; \(\hat \phi _P\) is the estimate of dispersion parameter in the generalized linear model that is related to the variance of the distribution under H0: for binary outcomes it is fixed to 1, and for continuous outcomes it is the variance of the random errors. The score statistic can be obtained similarly to evaluate whether genetic variants are associated with disease status using the relatives’ phenotypes to replace the probands’ phenotypes based on relatives’ model (3). The pre-specified weights can be a function of MAF. For example, one can use Wu’s weights [4] wj = Beta(MAFj;1,25) to up-weight the effect of rarer variants.

We propose to combine the score statistics from the two association models for probands and their relatives using a weighted meta-analysis to increase the flexibility of incorporating relatives with different degrees of relatedness (thus different kinship coefficients), as well as different numbers of available relatives for each proband. Meta-analysis is often used in genetic association analysis to increase the power by combining results from multiple studies. Methods to meta-analyze SKAT results have been developed [19]. Meta-analysis of rare variant association tests proposed is based on the study-specific summary statistics, that is, score statistics for each variant and linkage disequilibrium estimates in a region. Because of the genetic relationship between probands and their relatives, we down-weight the scores for relatives by 2Ω when combining the score statistics in a meta-analysis by assuming the homogeneous genetic effects among probands and their relatives. Specifically, because relative k of each proband may or may not have phenotype data available, we use \(Y^{R_k}\) to denote the collective phenotype vector for relative k of all probands (e.g., all mothers), including missing values, with kinship coefficient Ωk. The diagonal matrix D(Rk) indicates whether corresponding element in \(Y^{R_k}\) for each proband is missing (denoted by 0) or not (denoted by 1). Therefore, relatives with missing phenotype data do not contribute to the test statistic. We fit a single relative model jointly using all relatives’ phenotypes and covariates conditional on their probands’ phenotypes to get \(\hat \mu _{R_k}\), the estimated mean vector of \(Y^{R_k}\) for relative k of all probands, as well as the dispersion parameter estimate \(\hat \phi _R\) under the null hypothesis of no genetic effects. We assume that all relatives are independent in the model. The general form of FHAT statistics that incorporates FH from relatives is

$$Q_{{{{{{\rm{FHAT}}}}}}} = \left[ {\frac{{\left( {Y^P - \hat \mu _P} \right)^T}}{{\hat \phi _P}} + \mathop {\sum}\nolimits_k {\frac{{2\Omega _kD\left( {R_k} \right)\left( {Y^{R_k} - \hat \mu _{R_k}} \right)^T}}{{\hat \phi _R}}} } \right]\\ G^PWWG^{P^T}\left[ {\frac{{\left( {Y^P - \hat \mu _P} \right)}}{{\hat \phi _P}} + \mathop {\sum}\nolimits_k {\frac{{2\Omega _kD\left( {R_k} \right)\left( {Y^{R_k} - \hat \mu _{R_k}} \right)}}{{\hat \phi _R}}} } \right]$$
(4)

Under the null hypothesis, QFHAT follows a weighted sum of χ2 distributions with 1 degree of freedom, \(Q_{{{{{{\rm{FHAT}}}}}}}\sim \mathop {\sum}\nolimits_{j = 1}^m {\lambda _j\chi _{1,j}^2}\). The weights λj can be estimated from the eigenvalues of \(WG^{P^T}\left( {\hat P + \mathop {\sum}\nolimits_k {4\Omega _k^2D\left( {R_k} \right)\hat P_{R_k}D\left( {R_k} \right)} } \right)G^PW\), where \(\hat P\) and \(\hat P_{R_k}\) are the projection matrices in probands and relatives k, respectively, see the Supplementary Method. The p value can be estimated by Davies’ method [20]. The general form can be reduced to

$$Q_{{{{{{\rm{FHAT}}}}}}} = \left[ {\left( {Y^P - \hat \mu _P} \right)^T + \frac{{D\left( {R_m} \right)\left( {Y^{R_m} - \hat \mu _{R_m}} \right)^T}}{2} + \frac{{D\left( {R_f} \right)\left( {Y^{R_f} - \hat \mu _{R_f}} \right)^T}}{2}} \right]\\ G^PWWG^{P^T}\left[ {\left( {Y^P - \hat \mu _P} \right) + \frac{{D\left( {R_m} \right)\left( {Y^{R_m} - \hat \mu _{R_m}} \right)}}{2} + \frac{{D\left( {R_f} \right)\left( {Y^{R_f} - \hat \mu _{R_f}} \right)}}{2}} \right]$$
(5)

for incorporating FH from both parents (with mothers denoted by m and fathers denoted by f) when using logistic regression models for binary trait with the estimates of dispersion parameters fixed to 1 (i.e., \(\hat \phi _P = \hat \phi _R = 1\)), and the kinship coefficients (Ωm, Ωf) fixed to \(\frac{1}{4}\).

Optimal FHAT (FHAT-O)

Using the same framework adopted in FHAT, we develop a FHAT-O statistic based on the optimal unified test SKAT-O [8]. Since SKAT-O combines the feature of SKAT and Burden tests, the power is robust in the presence of both different and same directions of causal variant effects.

We first develop a FHAT-Burden, which is a weighted sum of the weighted score statistics in probands, and relatives based on their relationships (Supplementary Method). Then we propose unified test defining as the weighted average of FHAT and FHAT-Burden:

$$Q_\rho = \left( {1 - \rho } \right)Q_{{{{{{\rm{FHAT}}}}}}} + \rho Q_{{{{{{\rm{FHAT - Burden}}}}}}}$$

where the weight ρ can be estimated to minimize the p value using the procedure proposed by Lee et al. [21]. When ρ = 1, Qρ reduces to FHAT-Burden, and when ρ = 0, Qρ is equivalent to FHAT. The statistic for optimal test FHAT-O that combines the features of FHAT and FHAT-Burden is determined as follows:

$$Q_{{{{{{\rm{FHAT - O}}}}}}} = \mathop {{\min }}\limits_{0 \le \rho \le 1} P_\rho$$
(6)

where Pρ is the p value estimated for each given ρ (more details are in the Supplementary Method).

Simulation analysis

Simulations were performed to evaluate the FHAT and FHAT-O statistics in terms of empirical type I error and statistical power. We generated 10,000 haplotypes for a 4 kb region on chromosome 19 using HapGen2 software [22]. The data from 1000 genomes project were used as the reference panel to simulate haplotypes. In all simulations, we focused on binary traits because they are more often collected through questionnaire in relatives and we focused on rare variants with MAF < 1%. We used the definition from Chen et al. [23] to calculate the genetic effect size. We simulated the probands with both genotypes and phenotypes, and available FH data from both parents. We used LT-FH phenotype in SKAT (SKAT-LTFH) and SKAT-O (SKATO-LTFH) and compared the results to FHAT and FHAT-O, and they were all calculated by combining the FH from relatives (i.e., mothers and fathers) into the analysis. The standard methods (SKAT, SKAT-O, Burden test, and ACAT-V) only used proband data in the analysis. Because mothers and fathers were simulated as independent samples, they were analyzed using a single relatives’ model (3) and then FHAT and FHAT-O statistics were calculated using (5) and (6). The type I error and power of FHAT and FHAT-O were compared to SKAT-LTFH, SKATO-LTFH, SKAT, SKAT-O, Burden test, and ACAT-V. Note that ACAT-V is an aggregation unit-based test combining variant-level p values using ACAT. The detailed description of type I error and power simulations can be found in the Supplementary Method.

Analysis of whole exome sequencing data in the UK Biobank

The UK Biobank is a large prospective cohort study with information on clinical traits, covariates, and genome-wide genotype data for over 500,000 individuals with age at assessment between 37 and 73 years at baseline (2006–2010). The second tranche of exome sequence data of approximately 4 million coding variants for 200,000 individuals has been recently completed in the UK Biobank. FH of all cause dementia and hypertension was collected from questionnaires. Rare variant (with MAF < 1%) gene-based analyses detailed in the Supplementary Method were conducted to analyze all cause dementia and hypertension in the UK Biobank data.

Results

Type I error and power

A total of 20 million simulation replicates were first generated to evaluate type I error at various alpha levels for FHAT, FHAT-O, SKAT, SKAT-O, Burden test, and ACAT-V using 5000 probands with available parental history (Table 1). SKAT and SKAT-O have inflated type I error for prevalence = 20%, while the type I error is controlled better in FHAT, FHAT-O. When the disease prevalence is low (i.e., 10%), FHAT and FHAT-O have inflated type I error, especially for exome-wide significance (alpha = 2.5 × 10–6), but the inflation is smaller than that was observed with SKAT and SKAT-O after incorporating additional cases in relatives. A slightly deflated type I error was observed in FHAT and SKAT for prevalence = 50%. The conservativeness of SKAT when the prevalence is 50% was also observed in prior publications [4, 8]. Burden test and ACAT-V control the type I error relatively better in some scenarios shown in Table 1. By comparing the type I error of the methods shown in Table 1 to SKAT-LTFH and SKATO-LTFH (Supplementary Table S2), FHAT and FHAT-O yield similar type I error results as in SKAT-LTFH and SKATO-LTFH, respectively. The type I error for the LTFH methods was evaluated at alpha level as low as to 2.5 × 10–5 to reduce the computational cost.

Table 1 The empirical type I error rate divided by the significance level.

Figure 1 summarizes the power simulation results of FHAT, SKAT-LTFH, SKAT, FHAT-O, SKATO-LTFH, SKAT-O, Burden test, and ACAT-V for disease prevalence = 20% at alpha = 2.5 × 10–6. Additional power results for prevalence = 50% and other alpha levels can be found in Supplementary Figs. S1S3. The causal variants in a region were set to have positive effects, or half of the causal variants have positive effects and half of the causal variants have negative effects. In all scenarios, similar patterns are shown in Fig. 1 and Supplementary Fig. S1. Our main findings included: (1) FHAT and FHAT-O are more powerful than SKAT-LTFH and SKATO-LTFH, respectively, under many scenarios when the variants have larger effects on the disease among older people; (2) FHAT and FHAT-O have greatly improved power compared to standard methods that do not incorporate FH in most scenarios except for the scenario when the proportion of causal variants is 10% and half of the causal variants have positive effects and half of the causal variants have negative effects. However, ACAT-V has substantial power loss in many other scenarios; (3) FHAT suffers from a loss of power when the proportion of causal variants is high and the causal variants have effects in the same directions. In contrast, FHAT-O outperforms FHAT in those scenarios, and remains powerful regardless of the directions of genetics effects or number of causal variants.

Fig. 1: Empirical power of FHAT, FHAT-O, SKAT-LTFH, SKATO-LTFH, SKAT, SKAT-O, Burden test, and ACAT-V at exome-wide significance for prevalence = 20%.
figure 1

In each plot, the x-axis in the format of +/–/0 indicates the proportion of variants with positive, negative, and no effects. Each bar shows the empirical power evaluated as the proportion of p values less than or equal to alpha = 2.5 × 10–6. The total sample size of probands was set to 5000. The analyses were restricted to rare variants with MAF < 1%. FHAT, FHAT-O, SKAT-LTFH, SKATO-LTFH, SKAT, SKAT-O, and Burden test all used the same Wu weights with beta (MAFj; 1, 25). ACAT-V used the weights of \(w_{j,{{{{{\rm{ACAT - V}}}}}}} = w_{j,{{{{{\rm{SKAT}}}}}}} \times \sqrt {{{{{{\rm{MAF}}}}}}_j(1 - {{{{{\rm{MAF}}}}}}_j)}\) to make results comparable. FHAT, FHAT-O, SKAT-LTFH, and SKATO-LTFH analyzed probands and incorporated the family history information, while SKAT, SKAT-O, Burden test, and ACAT-V only included probands. The proportion of causal variants was set to 10%, 20%, 50%, 80%, and 100%. The numbers of variants tested in a region considered were: 20, 40, 80.

Computational cost

FHAT and FHAT-O and other existing methods (SKAT, SKAT-O, Burden test, and ACAT-V) have lower computational cost compared to SKAT-LTFH and SKATO-LTFH. Table 2 summarizes computation time (in minutes) for all methods for analyzing 1000 regions that contain 30 variants. The computation time of FHAT, FHAT-O, SKAT, SKAT-O, Burden test, and ACAT-V depends on sample size and region size, whereas the running time for SKAT-LTFH and SKATO-LTFH (conducting using the LT-FH software v2 [11]) depends on the number of configurations of probands’ disease status and FH.

Table 2 Computational time for testing 1000 regions.

Application to the UK Biobank

We restricted the analysis to 129,670 white individuals who passed all filters and have exome sequencing data, phenotype, and available parental disease status (see details in the Supplementary Method). The age at the first assessment visit for probands is between 38 and 72 with the mothers of probands being between 60 and 105, and the fathers of probands being between 60 and 102. There are 27 dementia cases (p = 0.02%) and 32,773 hypertension cases (p = 25.3%) among probands. While mothers and fathers of probands have similar hypertension prevalence (37,145 hypertension cases in mothers, p = 28.6%; 26,063 hypertension cases in fathers, p = 20.1%), more dementia cases are observed in the parents (10,654 dementia cases in mothers, p = 8.2%; 5720 dementia cases in fathers, p = 4.4%) compared to probands.

We first evaluated the associations between all cause dementia and hypertension with known regions previously implicated with AD/dementia risk [15] and hypertension [24,25,26,27]. We performed the analysis for all unrelated white individuals using FHAT, FHAT-O, SKAT-LTFH, SKATO-LTFH, and other conventional tests (SKAT, SKAT-O, Burden test, and ACAT-V), see results in Table 3. The samples involved in the analyses varied because of missing values in the covariates used for adjustment in the models. FHAT, SKAT-LTFH, FHAT-O, and SKATO-LTFH had improved significance after incorporating parental phenotype information compared to p values calculated using other conventional tests for majority of genes. SKAT, SKAT-O, and ACAT-V had almost no power to detect some associations for all cause dementia due to low prevalence in probands. The results show that BCL3 (p = 6.8 × 10–5 in FHAT, p = 2.5 × 10–5 in SKAT-LTFH, p = 5.9 × 10–5 in FHAT-O, p = 1.8 × 10–5 in SKATO-LTFH) and TOMM4 (p = 3.0 × 10–4 in FHAT, p = 5.8 × 10–4 in SKAT-LTFH, p = 3.8 × 10–4 in FHAT-O, p = 7.7 × 10–4 in SKATO-LTFH) were significantly associated with all cause dementia status at a significance level of 6.3 × 10–3 for testing eight genes. At the same significance level, DBH (p = 1.3 × 10–3 in FHAT, p = 2.0 × 10–3 in SKAT-LTFH, p = 2.6 × 10–3 in FHAT-O, p = 3.3 × 10–3 in SKATO-LTFH) was identified for hypertension and which had improved significance compared to the results from conventional methods. Although the tests that incorporate FH demonstrated an improved significance for all eight AD/dementia genes we tested, some p values for hypertension genes were less significant. This may be due to the fact that the prevalence for hypertension in probands was similar to that in parents, and the associations were diluted by the potential noises that were added when combining the FH from parents.

Table 3 Association analysis for genes previously implicated in all cause dementia and hypertension susceptibility.

A comprehensive exome-wide analysis was then conducted. A total of ~18K genes with two or more rare genetic variants meeting our filtering criteria were included. We used models including the same covariates for all cause dementia and hypertension as we did in the known gene analyses. We used p < 5.6 × 10–5 as the suggestive significance threshold for testing ~18K genes. In the analysis of all cause dementia (Table 4 and Fig. 2), the gene TREM2 [28] (p = 4.1 × 10–9) with known effects on AD/dementia and late-onset AD achieved a strict exome-wide significance (p < 2.8 × 10–6) using FHAT-O and it was also detected by FHAT (p = 5.2 × 10–6) with a suggestive exome-wide significance. One known AD/dementia gene, PVR [29] (p = 1.2 × 10–5 in FHAT and p = 1.8 × 10–5 in FHAT-O) was identified with both FHAT and FHAT-O analysis, and ABCA7 [30] (p = 4.1 × 10–5) with known effects on AD/dementia was identified by FHAT-O. Moreover, three novel genes were found to be significantly associated with all cause dementia using FHAT and FHAT-O (EFCAB3 with p = 4.0 × 10–5 in FHAT and p = 4.2 × 10–5 in FHAT-O, EMSY with p = 4.4 × 10–5 in FHAT and p = 2.7 × 10–5 in FHAT-O, and KLC3 with p = 1.4 × 10–5 in FHAT-O). Because we observed highly inflated results (Fig. 2) from hypertension analysis due to the correlation among parents’ phenotypes, we corrected the analysis by additionally adjusting for the spouse’s hypertension status in the parents’ model. For the adjusted hypertension analysis (Table 4 and Fig. 2), FHAT identified GATA5 (p = 4.1 × 10–5), and FHAT-O identified FGD5 (p = 4.3 × 10–5) and DDN (p = 4.2 × 10–5) at a suggestive significance level. Those genes detected by our methods have previously been reported to be associated with hypertension-related trait [31,32,33].

Table 4 Whole exome-wide association analysis for all cause dementia and hypertension.
Fig. 2: Q–Q plots of whole exome-wide analysis results for all cause dementia and hypertension.
figure 2

The p values for regions with cumulative minor allele counts >20 were used to generate the Q–Q plots. The left panel is the whole exome-wide analysis results for all cause dementia, where FHAT and FHAT-O were calculated using the model with the same covariates (age, sex, PC1−5, PC11) adjusted in AD/dementia known gene analysis. The right panel is the whole exome-wide analysis results for hypertension, where FHAT and FHAT-O were calculated using the model with the same covariates (age, age squared, sex, body mass index (BMI), PC1-PC5, PC8, and PC14) adjusted in hypertension known gene analysis. FHAT_adjust and FHAT-O_adjust were calculated from the adjusted hypertension analysis, where the spouse’s hypertension status combining with other previously mentioned covariates were adjusted in the parental analysis.

Discussion

We proposed two novel approaches, FHAT and FHAT-O, that incorporate FH to increase power to detect rare variant associations in aggregation unit-based analysis. We also offered a novel way to adapt the LT-FH method to analyze rare variants. Because FH of disease is often collected through questionnaires in large cohorts, the added power is at no added cost. We applied our methods to exploit the FH from parents in simulation analysis and using the UK Biobank data, by assuming that the parents are conditionally independent. We analyzed both parents through a single relatives’ model, and combined the scores calculated from parents and probands with appropriate weights to calculate the test statistics. Because the probands’ analysis is separate from the relatives’ analysis, our methods can handle the missingness in FH as presented in (1) and (4), and one can include all probands with or without FH to optimize the usage of data.

The power was evaluated at alpha = 2.5 × 10–6 to represent the exome-wide significance for testing 20,000 genes as well as at a suggestive threshold of alpha = 2.5 × 10–5. By assuming that the causal variants in older people have bigger effects compared to younger people, we showed that FHAT and FHAT-O have slightly greater power than SKAT-LTFH, SKATO-LTFH, with greatly reduced computational cost. Compared with SKAT and ACAT-V, FHAT has greater gain in power in most cases. However, FHAT and SKAT are less powerful than Burden test and SKAT-O when there is a high proportion of causal variants, especially when the causal variants all have the positive effects. FHAT-O combines the features of both FHAT and FHAT-Burden, has robust power in many scenarios, and outperforms other methods, as shown in our extensive power simulations. ACAT-V has slightly higher power in some cases where the proportion of causal variants is low, which was expected because only a few genetic variants contribute to the results in ACAT-V, though the score statistic for FHAT and FHAT-O is calculated using a linear combination of squared scores from both causal and non-causal variants. We further demonstrated that our methods have improved significance after incorporating FH from association analyses with all cause dementia and hypertension using genotypes and phenotypes collected from the UK Biobank. We compared results using FHAT, FHAT-O, SKAT-LTFH, and SKATO-LTFH for probands with both genotypes and phenotypes, and their parental history of disease to other methods only using probands. Variants in eight known AD/dementia regions and eight known hypertension regions were selected for the analysis. Using the significance level = 6.3 × 10–3 for testing eight known genes, BCL3 and TOMM40 were significantly associated with all cause dementia, while other known AD/dementia regions had improved significance compared to the methods that do not incorporate FH. Some of the hypertension genes were less significant using our method to incorporate FH, which might be caused by additional noise resulting from a similar hypertension prevalence in probands and their parents. The FHAT and FHAT-O approaches yielded similar conclusions compared to SKAT-LTFH, and SKATO-LTFH, respectively.

We evaluated type I error at various alpha levels and disease prevalence. We did not evaluate the type I error for SKAT-LTFH and SKATO-LTFH at the exome-wide significance (alpha = 2.5 × 10–6) to limit the computational cost. The type I error of SKAT was previously found to be conservative when the disease prevalence is ~50%, and the Burden test was found to have appropriate type I error when the case–control ratio is balanced [5,6,7]. However, SKAT, SKAT-O, Burden, and ACAT-V suffer from substantial inflated type I error when the prevalence is low, especially for lower alpha level (i.e., alpha <2.5 × 10–4). In contrast, the FHAT, SKAT-LTFH, FHAT-O, and SKATO-LTFH control the type I error rates relatively better. The type I error is overall well controlled using FHAT and FHAT-O in most scenarios, but a high inflation occurs for alpha = 2.5 × 10–6 and prevalence = 10% where the number of cases and controls is unbalanced (Table 1). Unbalanced case–control ratio yields inflated type I error rates because the imbalance invalidates the asymptotic assumption of logistic regression. Saddle point approximation [34,35,36] method and efficient resampling [37] have been successfully used to calibrate binary phenotype-based logistic mixed models when case–control ratios are extremely unbalanced. In the future, we plan to adopt these cutting-edge methods to properly account for unbalanced case–control ratio.

In the exome-wide association analysis, we used the same covariates (age, sex, PC1-5, PC11) as we did in the known region analysis for all cause dementia. However, as the inflation was observed in our hypertension analysis (Fig. 2), we further adjusted for the spouse’s disease status in the parents’ model to account for the correlations among parents in addition to the covariates of age, age squared, sex, BMI, PC1-PC5, PC8, and PC14. The FH could be correlated with household effects. In the future, we will extend the current approaches to allow for correlation, as might be induced by household effect, in the analysis. Through the exome-wide analysis using FHAT and FHAT-O, we confirmed previously reported genes (TREM2, PVR, and ABCA7) [2830] for AD/dementia as well as genes (GATA5, FGD5, DDN) [3133] related to blood pressure and hypertension. Moreover, our methods identified three novel regions associated with all cause dementia (EFCAB3, EMSY, KLC3) using a suggestive exome-wide significance threshold. Replication analyses are needed to confirm these findings. While we observed inflated type I error for low prevalence in our simulations, we did not see evidence of large inflation of FHAT and FHAT-O in all cause dementia analysis, as seen from the Q–Q plot (Fig. 2) and genomic control inflation factor (with λFHAT = 1.13 and λFHAT_O = 1.06 for all cause dementia analysis). The methods require that all samples are unrelated. The generalized linear mixed model can be used to expand the current methods for related samples, which will allow us to incorporate FH from multiple relatives or handle the consanguineous families.

Although the method development, simulation studies, and UK Biobank analysis described in the paper were focusing on the population samples, our methods can also handle the ascertainment that happens in case–control analysis, because the likelihood can be written as the product of the retrospective proband information, taking ascertainment into consideration:\(P\left( {G_i^P,Y_i^R|Y_i^P,X_i^P,X_i^R} \right) = P\left( {Y_i^R|G_i^P,Y_i^P,X_i^R} \right)P\left( {G_i^P|Y_i^P,X_i^P} \right)\) (Supplementary Method). Equation (1) was derived based on the assumption of independence of the relatives’ phenotype and probands’ covariates conditional on the relatives’ covariates and the strength of the associations in relatives. However, when the proband covariates are believed to have an effect on the relatives’ disease status, one can adjust for such covariates in the relatives’ model (3) to account for such an effect. There might be a concern about the accuracy about the FH collected from the probands. The reporting bias in FH among relatives would lead to misclassification of relatives’ disease status, which might cause biases in effect size estimates. However, the methods we proposed are variance component models that do not rely on effect estimates and only provide statistical significance (p values) for associations. The misclassification would affect the power of our methods, but would not affect the validity of the test (i.e., type I error) as the misclassification is not related to genotype data under the null hypothesis of no association between genotypes and disease status. We would expect minimal impact on the methods based on the previously published work, where FH was down-weighted based on the accuracy calculated as the correlation of FH recorded among siblings, resulting only in small changes to the association results [11].

In this paper, we demonstrated that FHAT and FHAT-O are computationally efficient compared to SKAT-LTFH and SKATO-LTFH. The significant reduced computational cost using FHAT and FHAT-O was shown in the analysis time to run 1000 aggregation unit-based tests. Although we focused on binary traits and rare variants, our method can be applied to analyze continuous traits using linear models and common variants. The framework in FHAT is flexible for various settings. While we applied FHAT and FHAT-O for probands with parental disease status available in simulations and the UK Biobank analysis, FHAT can be easily applied to other relative types. We also proposed an extension to FHAT, FHAT-O, to capture the features in SKAT-O, in particular the robustness of the power when all genetic variants have the same direction of effects and the proportion of causal variants is high. The framework can easily be extended to incorporate any other established aggregation unit-based methods. Our methods that allow the incorporation of available FH are innovations compared to traditional rare variant studies that only use cases and controls, which have great potentials to promote genetic association research.