Introduction

While genome-wide association studies have identified many common variants associated with complex traits, it has also been recognized that rare variants with minor allele frequency (MAF) smaller than 1–5% play an important role in complex traits.1 In spite of the importance of these variants, testing for their association with traits is a challenging problem. The main reason is the extremely low frequency of rare variants, which leads to low power of single-variant tests. Therefore, we collected several relevant rare variants and tested for their joint association with various traits, which is known as the region-based rare variant test and is now becoming the standard method for detecting rare variants.2

Many researchers have already proposed effective region-based testing methods. The earliest methods are burden-based tests, which summarize the information of each rare variant.3, 4, 5, 6 It is well-known that burden-based tests suffer from low power when there are large numbers of non-causal variants or both protective and deleterious variants.7 To overcome the limitations of burden-based tests, the sequence kernel association test (SKAT)8 using the variance component test has been proposed, along with a composite version of SKAT called the optimal sequence kernel association test (SKAT-O).9 It has been shown that SKAT-O has higher power than both burden-based tests and SKAT in a wide range of scenarios.9 However, SKAT-O corresponds to an efficient testing method for variance components under a logistic regression model that assumes the random-effects regression parameters are normally distributed with mean 0, so that the optimality of SKAT-O is possibly violated when unknown and irrelevant parametric assumptions are not correct.

As an alternate testing method, a new rare variant test called the Kullback–Leibler distance test (KLT)10 using the Kullback–Leibler distance has been recently proposed. The procedure permits the straightforward comparison of two distributions over M rare variants divided by case and control, and uses the Kullback–Leibler distance of the two distributions as the test statistics. The authors reported that the suggested testing method performed better than SKAT-O in their simulation studies. However, there are no theoretical arguments concerning the theoretical optimality of KLT, defined for instance as having the greatest statistical power. Since the statistical power to detect disease-related rare variants is usually insufficient regardless of the specific test used, the possible non-optimality might yield serious losses of efficiency in practice. In fact, in some scenarios in our simulation studies, presented in the Results section, KLT was under-powered compared to SKAT-O.

In this paper, we propose a new and efficient testing procedure called the aggregated conditional score test (ACST). The basic idea of the test is that we jointly test the association between a single variant and disease status over M variants. Specifically, we calculate the conditional score statistics for effect sizes of M variants, and we aggregate these statistics for simultaneously testing the association between M variants and disease status. We propose two aggregation methods, both of which hold optimality under certain correlation structures among M variants. Hence, the optimality of ACST is expected to result in the efficient identification of disease-related variant sets. Moreover, ACST does not require any presumed structures of effect sizes, unlike SKAT-O, so that the theoretical optimality of ACST is assured under a wide range of structures of regression parameters. In fact, in our simulation study in the Results section, ACST generally performed well compared with SKAT-O and KLT. Moreover, through application to the Dallas Heart Study data set in the Results section, ACST was revealed to work well as a useful tool for the analysis of genome-wide sequencing data.

Methods

Notations and model

We consider a data set with N subjects, among which n1 are cases and n2 are controls. For the ith subject, i=1,…, N, we observe a phenotype yi and a multi-site genotype Gi=(gi1,…,giM)T, where yi takes values 0 or 1, representing control and case, respectively, and gik, k=1,…,M, are coded as 0, 1 or 2, representing the number of minor alleles that subject i holds in the kth variant. It is noted that the MAFs of most M variants are extremely low and our goal is to test the association between Gi, a set of (rare) variants, and a disease status yi. To this end, region-based tests have been widely studied2 since a test for an individual variant with extremely low MAF cannot be expected to achieve sufficient power to detect its effect.

We consider the following standard logistic regression model that has been widely adopted in genetic association studies:

where pik is the probability of yi=1 caused by gik, and rk denotes a variant-specific intercept. Our interest is the effect size βk rather than rk, namely rk is a nuisance parameter. To perform the region-based test for detecting rare variant effects simultaneously, we consider the testing problem:

H0: β1=β2==βM=0 vs H1: βk≠0 for at least one of k (k=1,2,…,M).

As mentioned before, the existing methods have several restrictions and limitations such as the strong parametric assumption of (β1,β2,…,βM)T as in SKAT-O9 and the lack of theoretical optimality as in KLT.10 Therefore, to solve these problems, we construct an optimal test that achieves the most powerfulness without any parametric assumptions of β1,β2,…,βM. To this end, we first derive the conditional score statistic of the null hypothesis βk=0 under the logistic regression model since it is known that the conditional score test has the greatest power and that conditional inference can eliminate the effect of a variant-specific intercept rk. Subsequently, we suggest the test statistic for simultaneously testing β1=β2==βM=0 by aggregating all the conditional score statistics.

Aggregated conditional score test (ACST)

The conditional likelihood for βk is expressed as

where D(n1) is an arbitrary subset of {1,…,N} with n1 elements and denotes the summation over all the possible subsets D(n1). Note that the conditional likelihood CLk(βk) is free from the variant-specific intercept rk. For notational simplicity, we introduce 2 × 3 contingency tables given in Table 1, in which we summarize the quantities used in the test statistics. Based on CLk(βk), the conditional score statistic , for testing βk=0 is given by

Table 1 2 × 3 contingency tables between phenotype variable , representing the case (yi=1) and control (yi=0), and genotype variable , representing the number of minor alleles

where the detailed derivation is deferred to the Appendix. It is well-known that the conditional score test for βk based on sk is most powerful.11

To perform a test for β1,…,βM, simultaneously, we need to aggregate these scores. The widely used method for aggregating variant-specific statistics is summing up squared statistics.12, 13 Hence, we first propose the test statistic U1=STS, where , which achieves asymptotically greater power when all the variants are independent. However, it is often observed that there is a linkage disequilibrium (LD) structure among variants, so that variants are correlated; thereby the score statistics sks are mutually correlated. In this case, U1 is not necessarily efficient.

To adapt the LD structure, we propose to allow the score vector S to have an exchangeable correlation matrix, Rρ=(1−ρ)I+ρ11T, as used in SKAT-O. Using the correlation structure, we consider the statistic as the function of . When ρ=0, Qρ reduces to U1. When ρ=1, we define as the generalized inverse matrix satisfying , that is, . Hence, when ρ=1, and M2Qρ corresponds to the well-known Mantel–Haenszel test statistic in 2 × 3 contingency tables. For a fixed ρ, Qρ follows a mixture of χ2 distributions for large N since the score vector S asymptotically follows a multivariate normal distribution with mean vector 0 and some covariance matrix C with all the diagonal elements 1. Specifically, let λ1,…,λM be the eigenvalues of then the null distribution of Qρ can be closely approximated by , where are mutually independent random variables. For estimation of the correlation matrix , we note that and

which can be estimated by the sample correlation between (g1k,…,gNk)T and (g1j,…,gNj)T. Then we can compute the P-value of Qρ for each fixed ρ. However, there is little information about unknown parameter ρ in applications, thereby we propose selecting the optimal value of ρ to maximize the power similarly to SKAT-O.9 Hence the proposed test statistic is

where pρ is the P-value of Qρ. In practice, the test statistic U2 can be computed by the simple grid search, namely for . Since the correlations among variants are not large, it could be enough to search the optimal ρ around 0. Hence, we set the default grids as 21 points from −0.1 to 0.1 with equal intervals.

We call the test using the conditional score statistic sk the ACST, and, in particular, we refer to the two tests using U1 and U2 as the independent ACST (ACST-I) and the correlation-adjusted ACST (ACST-C), respectively. Note that ACST is asymptotically most powerful without any restriction among β1,…,βM, so that ACST is flexible and theoretically efficient compared, for example, with SKAT-O and KLT. It should be noted that the score statistic sk cannot be computed when both m1k and m2k are 0, namely there are no minor alleles in the kth variant. Hence, we need to omit variants with no minor alleles in advance in order to perform ACST.

In case , that is, there are no subjects carrying two minor alleles, corresponding to aik=a2k=m1k=0 in Table 1, the score statistic sk reduces to the form

and the sum of these scores corresponds to the Mantel–Haenszel test statistic. Hence, the score statistic sk can be regarded as a generalization of the score statistic used in the Mantel–Haenszel test in 2 × 3 contingency tables. It should be noted that ACST achieves the greatest power if the stratum-specific log odds ratios β1,…,βM are heterogeneous (for arbitrary values of β1,…,βM under H1), while the Mantel–Haenszel test was derived as an asymptotically efficient test under the common effect assumption across the strata.14, 15 Thus, ACST, a generalization of the Mantel–Haenszel test, maintains optimality under a broad range of conditions regardless of the homogeneity of the effect measures, and this fact is quite meaningful in the context of region-based simultaneous testing to efficiently detect disease-related rare variants.

Calculation of the P-value

Concerning the calculation of P-values of ACST, we propose a permutation method since it enables us to compute adequate P-values regardless of sample size N and number of variants M. Without loss of generality, we demonstrate a permutation test only of ACST-I. We randomly shuffle the disease status yi of all subjects in the sample, and we calculate the test statistic from the permuted data. We repeat this process B times to calculate the P-value as

where I() denotes the indicator function. It should be noted that ACST-C takes much more running time than ACST-I since ACST-C needs to compute the minimum of P-values in each permutation.

Results

Simulation study: evaluation of type-I error rates

To evaluate the performance of ACST compared with two existing methods, SKAT-O and KLT, under realistic situations, we carried out simulation studies. To begin with, we evaluated the type-I error rate of ACST and those of KLT and SKAT-O. We considered M=20 variants and they were divided into four groups with five variants in each group. The MAFs of each rare variant in the same group were set to be equal. With (m1,…,m4) as the combination of MAFs in the four groups, we considered the following three patterns:

For generating genotype data, we first generated two M-dimensional binary vectors a1, a2 using rmvbin function in R with a correlation matrix with ρ=0.05. Then we set the genotype data G=(g1,…,gM)T as G=a1+a2. For evaluating type-I error rates, we generated the disease status y of each subject from the null logistic regression model:

In the above model we used , corresponding to a 20% background disease prevalence. From this model, we generated 5n samples and randomly selected n/2 cases and controls for n=2000. Then, we applied four tests ACST-C, ACST-I, KLT and SKAT-O to the generated data set with significance level α=0.05,0.01 and 0.001. We used 2000 permutations for calculating P-values of ACST-C, ACST-I and KLT. Based on 1000 simulation runs when α=0.05 and 0.01, and 5000 runs when α=0.001, we computed the simulated type-I error rates of the four tests, which are presented in Table 2. It is observed that the type-I error rates of the four methods are around the nominal significance level, so all the four procedures can adequately control the type-I error rates.

Table 2 Simulation results: type-I error comparison among ACST-C, ACST-I, KLT and SKAT-O at significance levels α=0.01, 0.05 and 0.001

Simulation study: evaluation of power

We next evaluated the statistical power of the four tests, ACST-C, ACST-I, KLT and SKAT-O, for detecting disease-related rare variants via simulation studies. We generated M=20 variants in the same way as in the previous simulation, the correlation parameter ρ was set to 0, 0.02 and 0.05, and (B) pattern of MAFs was considered in this study. To evaluate the power, we used the following non-null logistic model:

where . For the setups of the non-zero effect sizes , we considered the following eight scenarios:

It is noted that the number of causal variants gets smaller as the scenario number gets larger. In Supplementary Table S1, we show the number of causal variants in each MAF group in the eight scenarios. In scenarios 1 and 2, all the variants are causal and deleterious, in which the parametric assumption in SKAT-O seems reasonable. In scenario 3, rarer variants have larger effect sizes while there exist both deleterious and protective variants. In scenarios 4, 5 and 6, some variants are causal, which are deleterious or protective. In scenarios 7 and 8, a small portion of variants are causal while the effect sizes are relatively large in scenario 8. In each scenario, we generated 5n samples and randomly selected n/2 cases and controls with n=2000, and applied the four testing methods with a significance level α=0.05. In applying ACST-C, ACST-I and KLT, we used 2000 permutations to obtain P-values. Based on 1000 simulation runs, we computed the simulated powers in eight scenarios, which are shown in Figure 1. It is observed that SKAT-O performs quite well when all the variants are causal and deleterious like in scenarios 1 and 2. However, when protective variants are included or the number of causal variants is small, the result reveals that SKAT-O tends to be under-powered. Concerning KLT, it seems to perform well when the number of causal variants is small as in scenarios from 3 to 8. However, we can observe that KLT is extremely under-powered in scenarios 1 and 2. On the other hand, both ACST-I and ACST-C provide reasonable powers in all scenarios. It is worth noting that ACST-C provides the almost same powers as SKAT-O, while it is under-powered compared with ACST-I in scenarios 5 and 6. In Supplementary Figure S1, we also provide a power comparison as a function of a quantity determined by the effect sizes and MAFs.

Figure 1
figure 1

Simulation results: Power comparisons among ACST-C (aggregated conditional score test using the exchangeable correlation structure among variants), ACST-I (independent aggregated conditional score test assuming independence among variants), KLT (Kullback–Leibler test) and SKAT-O (optimal sequence kernel association test) at significance level α=0.05, number of subjects n=2000 and correlation parameter ρ=0, 0.02 and 0.05. The empirical powers were calculated based on 1000 simulated data sets. Two thousand permutations were used for computing P-values for ACST-C, ACST-I and KLT.

We next evaluated the statistical powers of the four tests with smaller significance level α=0.001 in three scenarios 2, 5 and 8 with ρ=0.05. We used 2000 permutations for ACST-C, ACST-I and KLT for computing P-values. Based on 2000 simulation runs, the empirical powers were calculated, which are presented in Table 3. It is observed that power relations among the four tests are not different from the results with α=0.05.

Table 3 Simulation results: power comparisons among ACST-C, ACST-I, KLT and SKAT-O at significance level α=0.001, number of subjects n=2000 and ρ=0.05

Applications to the Dallas Heart Study

We applied ACST together with KLT and SKAT-O to the sequence data from the Dallas Heart Study16 to test the association between serum triglyceride (TG) levels and rare variants in three genes (ANGPTL3, ANGPTL4 and ANGPTL5). The data set was also used in both papers presenting SKAT-O9 and KLT,10 thereby we examined how ACST performs compared with SKAT-O and KLT. The data set has sequence information on 95 observed variants in the three genes from each of 3474 individuals, including 1830 African Americans, 043 European Americans and 601 Hispanics. The higher levels of TG in blood are known to be related to some metabolic disease such as diabetes, coronary heart disease and fatty liver disease. It has been revealed that ANGPTL3, ANGPTL4 and ANGPTL5 are associated with lower levels of TG.6, 17, 18, 19, 20 Most of the variants are rare: with the exception of one variant, the estimated allele frequencies of the variants are under 0.05. We considered a dichotomized trait by classifying individuals with the top q% of TG values as cases and the bottom q% as controls, and we considered three conditions, with q=15, 20 or 25. For each choice of q, we deleted variants that had no sequence variation among the case and control samples. In Table 4, we show the sample sizes of cases and controls for each q and the number of variants used for testing. In applying ACST-C, ACST-I and KLT, the P-values were computed based on 106 permutations.

Table 4 Results of the analyses of the Dallas Heart Study data: The P-values of ACST-C, ACST-I, KLT and SKAT-O

The P-values of ACST-C, ACST-I, KLT, and SKAT-O are given in Table 4. The results for ANGPTL3 and ANGPTL4 did not differ among the four testing methods. However, it reveals that both ACST-C and ACST-I provide smaller P-values than SKAT-O in ANGPTL5, which is has been shown to be associated with serum triglycerides. On the other hand, KLT produces smaller P-values than both ACST-C and ACST-I while the differences are relatively small. Moreover, the P-value of ACST-I was smaller than that of ACST-C in this case.

Concerning the running time of these methods, it takes 336 s in ACST-I, 6 h 25 min 54 s in ACST-C, 174 s in KLT and 0.1 s in SKAT-O for computing P-values of ANGPTL3 with q=15. It is observed that ACST-I takes almost as long as KLT compared with SKAT-O since both ACST-I and KLT requires permutations for computing P-values. On the other hand, ACST-C takes a much more running time than ACST-I since ACST-C requires grid search for optimal correlation parameter in each permutation. The program was run on a PC with a 2.7 GHz Intel Core i5-4570R Quad Core Processor with approximately 8GB RAM.

Discussion

We developed a new optimal test, ACST, to detect the association between a phenotype and a set of rare variants. We derived the conditional score statistics for testing the association between a phenotype and each single variant, and proposed two methods for aggregating these score statistics. The first method involves simply summing the squared score statistics, and the resulting test using the statistic is called ACST-I, which is most powerful if all the variants are independent. The second method is called ACST-C, and involves aggregation with a quadratic form, assuming the correlation matrix of the conditional scores is an exchangeable correlation matrix with tuning parameter ρ. We developed a grid search technique for selecting ρ to maximize the power. The P-values of both ACST approaches can be calculated using a permutation test.

In the simulation study of power comparison, we considered eight scenarios with various effect sizes and evaluated the power of ACST as well as KLT and SKAT-O. We found that SKAT-O tends to be under-powered when there was a large proportion of null variants or protective variants exist, while KLT was extremely under-powered when all variants are causal and deleterious. In comparison, ACST performs well in both scenarios since ACST is asymptotically most powerful under arbitrary effect sizes. On the other hand, the power of ACST is smaller than that of SKAT-O where the parametric assumption of SKAT-O seems correct while the differences are not large and ACST has still larger power than KLT. Concerning the simulation settings, the background prevalence should be smaller than that was used in our studies (20%) in terms of biological plausibility. However, the background prevalence is associated only with the intercept term and does not affect the superiority and inferiority relationship among the four tests. In the applications to DHS data set, ACST performed better than SKAT-O, but ACST produced larger P-values than KLT in most cases. However, since the differences were quite small and KLT might be extremely under-powered in some cases, the use of ACST is justified.

Concerning usage of ACST-I and ACST-C, we first note that ACST-C is optimal even if rare variants to be tested are correlated. Hence, ACST-C should be recommended from a theoretical point of view. However, since it cannot be assumed that the correlations among rare variants are large in this context, ACST-I is expected to perform as well as ACST-C, which can be observed from the results in simulation study.

In this paper, we considered the case without any covariates except for genotype data. However, clinical covariates are often associated with disease status, which could improve the statistical power. Since the conditional score statistics under adjustment for covariates can be computed by using the conditional logistic regression,11 the extension of ACST seems straightforward. However, the detailed investigation is out of the scope of the paper and is left to a future study.

We used the exchangeable correlation structure in ACST-C for modeling correlations among the conditional score statistics. However, another type of correlation structure can also be implemented in quite a similar way. One possible alternative is a correlation structure defined as a function of a certain distance of variants to be tested. Finally, we note that ACST was developed under the condition where the phenotype variable yi was binary, but the generalization of ACST to the case of multinomial variables yi is somewhat straightforward. On the other hand, we are often faced with continuous phenotypes as well, and the extension of ACST to such cases will be a valuable future study.