Introduction

Rich genome-wide association study (GWAS) findings have suggested the sharing of genetic risk variants among multiple complex traits1,2. Multi-trait analyses that combine association evidence across traits can boost statistical power over single-trait analyses in detecting risk variants, especially for those traits that have weak associations with the variants. Many multi-trait methods are designed for testing the single-variant association3,4,5,6,7,8. However, the statistical power of single-variant tests is low for rare-variant association studies (RVAS)9. In light of this limitation, gene-based tests have been developed for RVAS to aggregate mutation information across several variant sites within a gene to enrich association signals and reduce the penalty resulting from multiple testing9. Although several methods are available for multi-trait multi-variant tests, most of them require individual-level genotype and phenotype data10,11,12,13,14,15 or are designed for common variants16,17,18,19 (Supplementary Table 1). The gene-based tests for RVAS have not been fully exploited in the multi-trait analysis.

The genetic architecture of complex traits is unknown in advance and is likely to vary from one gene to another across the genome and from one trait to another. Therefore, the main challenge of multi-trait multi-variant analyses is to flexibly accommodate a variety of genetic effect patterns among traits and variants such that the test is robust and has high power. The effect structures among rare variants within a gene have been well-studied when numerous gene-based tests were developed. The sequence kernel association test (SKAT)20 and burden tests21,22,23,24 are the most widely used gene-based tests for RVAS and represent two main patterns of genetic effects across rare variants. Burden tests assume effects across variants are largely homogeneous and SKAT assumes they are heterogeneous. SKAT-O25 is a test that achieves robustness by combining tests with various degrees of effect heterogeneity, including the SKAT and burden tests as special cases. Specifically, SKAT-O assumes rare-variant effects are random variables with a uniform (exchangeable) correlation and different levels of heterogeneity can be considered by changing the correlation coefficient.

The effects on multiple traits may also exhibit homogeneous and heterogeneous patterns. However, the degree of genetic effect similarity/heterogeneity are likely to vary from one trait pair to another. As an example, for the pair of traits that are biologically related (e.g., triglycerides (TG) and high-density lipoprotein cholesterol (HDL)), we expect they share more causal variants and have a higher level of genetic similarity than the pair of traits less relevant (e.g., TG and bipolar)26. Hence, it is not adequate to use a uniform correlation coefficient to model the degree of similarity for all trait pairs. Many recent studies have investigated the genetic overlap for many pairs of complex traits and diseases and estimated genetic correlation as a global measure of genetic similarity for trait pairs26,27,28. Although a genetic correlation is calculated using common variants across the genome and RV association tests are performed on the gene level, the idea of utilizing genetic correlation to guide the specification of gene-level effect heterogeneity across traits is intriguing and has not been considered in existing multi-trait methods.

Here we develop multi-trait analysis of rare-variant association (MTAR), a framework for the multi-trait analysis of RVAS. MTAR is built upon a random-effects meta-analysis model that uses different correlation structures of the genetic effects to represent a wide spectrum of association patterns across traits and variants. To model genetic effects across variants, MTAR employs the same strategy as SKAT-O. To model the rare-variant effect heterogeneity on multiple traits, MTAR leverages the genetic correlation. Specifically, we propose two correlation structures on the among-trait genetic effects. The first structure allows the between-trait effect similarity to change from the value of the genetic correlation to completely heterogeneous as an extreme and we term the resulting multi-trait association test iMTAR. The second structure allows the between-trait effect similarity to change from the value of the genetic correlation to homogeneous as an extreme and we term the resulting test cMTAR. Besides the aforementioned association patterns across traits, we also consider the scenario in which only a small number of traits are associated with the set of rare variants. This association pattern naturally occurs for the genes that have very specific biological functions and do not affect many traits. To accommodate this pattern, we construct another test, cctP, which uses the Cauchy method29,30 to combine single-trait RVAS P-values. To achieve robustness and improve overall power, we combine the P-values of iMTAR, cMTAR, and cctP, and refer to this omnibus test as MTAR-O. To demonstrate the usefulness of MTAR empirically, we analyze summary statistics from the Global Lipids Genetics Consortium (GLGC) on low-density lipoprotein cholesterol (LDL), HDL, and TG. MTAR discovers more lipid-associated genes than single-trait-based analyses and many novel association signals are replicated in an independent UK Biobank data. Moreover, our simulation results show that MTAR methods have well-preserved type I error rate and greater power over single-trait-based methods across a wide range of effect patterns across traits and variants. Finally, we compare MTAR with two existing multi-trait methods that outperform other competing methods. We find that MTAR is more powerful in almost all simulation settings and discovers more genes in the application to the GLGC data.

Results

MTAR overview

Suppose that we are interested in the effects of m variants in a gene on K traits. For k = 1, …, K, we let βk = (βk1, , βkm)T denote the effects of the m genetic variables on trait k. To perform MTAR tests, we first obtain the vector of variant-level score statistics for testing βk = 0 denoted by Uk = (Uk1, …, Ukm)T and the covariance estimate for Uk denoted by Vk. The Uk and Vk can be easily constructed using the information routinely shared in public domains (Methods). We let \(\widehat {\mathbf{\beta }}_k = {\mathbf{V}}_k^{ - 1}{\mathbf{U}}_k\) and write \(\widehat {\mathbf{\beta }} = (\widehat {\mathbf{\beta }}_1^{\mathrm{T}}, \ldots ,\widehat {\mathbf{\beta }}_K^{\mathrm{T}})^{\mathrm{T}}\). Given the true genetic effects \({\mathbf{\beta }} = ({\mathbf{\beta }}_1^{\mathrm{T}}, \ldots ,{\mathbf{\beta }}_K^{\mathrm{T}})^{\mathrm{T}}\), the \(\widehat {\mathbf{\beta }}\) approximately follows normal distribution with mean β and covariance 31,32, where \({\mathbf{\Sigma }} = {\mathrm{Blockdiag}}\{ {\mathbf{V}}_1^{ - 1}, \ldots ,{\mathbf{V}}_K^{ - 1}\}\) if traits are measured on studies without overlapping samples. If all the traits are from one study or multiple studies with overlapping subjects, the off-diagonal blocks in are not zeros. For any given traits k and k′ with sample overlap, the formula for estimating the covariance between \(\widehat {\mathbf{\beta }}_k\) and \(\widehat {\mathbf{\beta }}_{k\prime }\) is provided in Eq. (3).

We are interested in testing the null hypothesis that the m variants are not associated with any of the K traits: H0 : β1 = β2 =  = βK = 0. Multivariate test for this hypothesis has a large degrees of freedom and low statistical power. In MTAR, we further assume that the genetic effects β are zero-mean random effects with covariance matrix σB, where σ is an unknown scalar and B is a pre-specified matrix dictating the covariances of genetic effects among traits and variants. Under this random-effects model, the equivalent null hypothesis is H0 : σ = 0 and we test this hypothesis using a variance-component score test (Eq. (5)). The test will have the optimal power if the specification of B reflects the true covariance structure of the effects. The true structure of B is unknown a priori. To separately model the genetic structures among trait and among variants, we propose to formulate B = B2B1, where is the Kronecker product of among-variant effect covariance B1 and among-trait effect covariance B2. For B1, we assume the exchangeable correlation structure with a uniform correlation coefficient denoted by ρ1 (Methods). By specifying different values of ρ1, this structure allows various degrees of among-variant effect heterogeneity. As the two extremes, the effects across variants are homogeneous when ρ1 = 1; the effects are completely heterogeneous and vary independently when ρ1 = 0.

For the between-trait effect covariance, we set \({\mathbf{B}}_2 = {\mathbf{W}}_2{\mathbf{\Omega }}_2{\mathbf{W}}_2\), where W2 is a diagonal matrix with each diagonal element being a trait-specific weight and Ω2 is a between-trait effect correlation matrix. By setting the diagonal elements in W2 to 0 or 1, we can choose to focus on any subset of the traits and consider any degree of association sparsity across traits (e.g., set only one element as 1 for single-trait analysis or all the elements as 1 for all-trait analysis). It is not sensible to assume the exchangeable correlation structure for B2, because some pairs of traits are more similar in the rare-variant effects than other pairs (e.g., two diseases that were caused by the same set of rare mutations would have a large correlation in their rare-variant genetic effects). Here we propose to leverage the genetic correlation27 to inform the similarity of rare-variant effects among traits. Genetic correlation is a single number measure that quantifies the overall genetic similarity between a pair of traits. Recent advancement of methods enables us to conveniently estimate genetic correlation based on GWAS summary statistics27,28 and there are web portals to query genetic correlations among many complex traits26. We hypothesize that the genetic correlation is also informative to measure the similarity/heterogeneity of the gene-level rare-variant effects among traits for most genes in the genome. Specifically, let Ckk denote the genetic correlation between traits k and k′. We propose two types of correlation structures for Ω2. In both structures, we specify a parameter ρ2 (0 ≤ ρ2 ≤ 1) to control the contribution of genetic correlation Ckk to the degree of effect heterogeneity between traits k and k′. The iMTAR structure assumes the correlation coefficient is ρ2Ckk. Under this structure, the rare-variant effects across traits are heterogeneous and the degree of heterogeneity can change from Ckk (when ρ2 = 1) to completely heterogeneous (strongest level of heterogeneity as effects across traits can vary independently when ρ2 = 0). The cMTAR structure assumes the correlation coefficient is ρ2Ckk + (1 − ρ2). Under this structure, the degree of heterogeneity can change from Ckk (when ρ2 = 1) to homogeneous (no heterogeneity when ρ2 = 0).

As the optimal values of ρ1 and ρ2 are unknown, we propose to search a grid of different values of ρ1 and ρ2 and use the Cauchy method29,30 to combine multiple P-values (Methods). The resulting tests are named after the two aforementioned iMTAR and cMTAR structures. The Cauchy method is a fast and powerful approach to combine multiple correlated P-values without the need for estimating and accounting for their correlation. To accommodate the situation where the gene is associated with a small number of traits, we develop a test called cctP that uses the Cauchy method to combine single-trait P-values from SKAT and burden tests. As we demonstrate in the GLGC data analysis and simulation studies, the cMTAR, iMTAR, and cctP cover different effect patterns among traits. To achieve further robustness, we use the Cauchy method to combine P-values of the three complementary tests and term this omnibus test as MTAR-O. The summary of the proposed iMTAR, cMTAR, cctP, and MTAR-O methods are presented in Fig. 1. The calculations of the test statistics and P-values are described in Methods.

Fig. 1: Summary of methods under MTAR framework.
figure 1

In this illustration, the number of variants is m = 10 and the number of traits is K = 5. The degree of heterogeneity of among-variant effects is controlled by ρ1. MTAR methods are robust to various patterns of genetic effects across variants by combining variance-component test P-values from different specifications of ρ1. The degree of heterogeneity of among-trait effects is controlled by ρ2. By changing the value of ρ2, the degree of heterogeneity of among-trait effects can be weakly, moderately, or strongly dictated by genetic correlation Ckk. By setting ρ2 = 0, iMTAR and cMTAR structures assume genetic effects become completely heterogeneous and homogeneous, respectively. MTAR methods are robust to various patterns of genetic effects across traits by combining variance-component test P-values from different specifications of ρ2. The cctP that combines the single-trait burden and SKAT tests P-values is particularly powerful when only a small number of traits are associated with the set of rare variants. The omnibus test MTAR-O that combines iMTAR, cMTAR, and cctP is robust to all the aforementioned patterns of genetic effects across traits and variants.

Although both and B are covariance matrices among traits and variants, it is important to note the difference. Matrix reflects the correlation due to the residual relatedness among traits in the presence of sample overlap and linkage disequilibrium (LD) among variants. An inaccurate estimate of yields inflated type I error in the association testing. On the other hand, the matrix B = B2B1 reflects the similarity of the true gene-level rare-variant effects among traits and variants. This information is unknown a priori; hence, B needs to be pre-specified. The power of the tests can be greatly improved if the specification reflects the truth. MTAR utilizes the genetic correlation, a global measure of cross-trait genetic similarity, to guide the specification of B2. The effectiveness of this strategy in gaining power has been demonstrated in the following sections.

Application of MTAR to GLGC

We performed multi-trait RVAS for three plasma lipid traits: LDL, HDL, and TG. The GLGC data set includes ~300,000 individuals of primarily European ancestry genotyped with the HumanExome BeadChip (exome array)33. The participants were from 73 different studies and single-variant association summary statistics were combined across studies via fixed-effects meta-analysis32. The acquisition of the GLGC summary statistics is described in Methods.

Following Liu et al.33, we considered 179,884 rare variants with minor allele frequency (MAF) < 5% and the highest priority according to their functionality and deleteriousness. We focused on 15,378 genes that contain at least two rare variants. In our analysis, we used the previously reported genetic correlation estimates among the three lipid traits27 in MTAR. Specifically, the genetic correlation is −0.61 for the pair (HDL, TG), 0.35 for (LDL, TG), and 0.09 for (LDL, HDL). For comparison, we performed the single-trait-based analysis by combining SKAT and burden test P-values across traits using either the cctP or the Bonferroni-corrected minimal P-value (minP, take the minimal P-values and then multiply it by the number of tests combined).

Similar to the previous gene-based RVAS of GLGC data33, the slightly elevated genomic control lambdas in the quantile–quantile plots suggest the polygenic inheritance of the lipid traits (Supplementary Fig. 1). At a significance threshold of P < 3.3 × 10−6 (corresponding to 0.05/15,378), a total of 140 genes were identified by at least one test (Supplementary Table 2). MTAR tests (MTAR-O, cMTAR, iMTAR) identified 139 genes and the single-trait-based tests (cctP and minP) identified 99 genes (Fig. 2). There are 41 genes exclusively identified by MTAR tests and the MTAR P-values for many of these genes are 100-fold smaller than the single-trait-based P-values (Table 1, Manhattan plots in Fig. 3 and Supplementary Fig. 2). There is only one gene (HFE, Supplementary Table 2) missed by MTAR but its MTAR-O P-value (4.8 × 10−6) is close to the single-trait-based P-values (1.8 × 10−6).

Fig. 2: Venn diagram of significant genes in the GLGC data analysis.
figure 2

MTAR-O, cMTAR, iMTAR, cctP, and minP test are performed and the number of significant genes identified by each method is shown in the parentheses.

Table 1 Results for the 41 genes exclusively identified by MTAR tests in the GLGC analysis.
Fig. 3: Manhattan plots of MTAR-O and minP results in the GLGC data analysis.
figure 3

The horizontal line marks the genome-wide significance threshold (3.3 × 10−6). The 41 genes highlighted in red are those exclusively discovered by MTAR tests (MTAR-O, cMTAR, and iMTAR). The Manhattan plots for the other methods (cMTAR, iMTAR, and cctP) are shown in Supplementary Fig. 2.

Most discovered genes (>60%) have the smallest P-value when ρ2 is large (ρ2 ≥ 0.5), highlighting the informativeness of using genetic correlations to guide the among-trait effect correlation (Supplementary Fig. 3). For those genes, the association patterns among traits are generally consistent with their genetic correlations: genetic effects on HDL and TG are negatively correlated and effects on LDL and TG are positively correlated (Fig. 4a). The cMTAR and iMTAR tests produce similar P-values in this case. About 18% of the discovered genes become insignificant if we do not use genetic correlations and simply assume the exchangeable correlation structure in B2.

Fig. 4: Heat maps of association signals in the GLGC data for four example genes.
figure 4

The darkness of the color indicates the variant-level Z-test P-values (in −log10 scale) for individual traits LDL, HDL, and TG. The positive and negative Z-scores are indicated by red and blue colors, respectively. a The effect correlations among traits resemble their genetic correlations. b The effects are independent among traits. c The effects are similar among traits. d Association signal resides in a single trait.

When the effects between-trait are strongly heterogeneous and vary randomly among traits (Fig. 4b), iMTAR produces much smaller P-values than other tests. When the effects between-trait effects are largely homogeneous (Fig. 4c), cMTAR provides the strongest evidence of association. When the gene is associated with one trait, the single-trait-based analysis (cctP and minP) is desirable (Fig. 4d). MTAR-O has the P-value close to the smallest P-values among all tests in all the identified genes (Supplementary Table 2).

Many of the 139 MTAR identified genes have an established role in the three lipid traits, including targets for LDL lowering drugs (e.g., PCSK9, NPC1L1, and PPARA) and genes with known association with lipid-related Mendelian disorders (e.g., LDLR, ABCG5, APOB, ABCA1, LCAT, APOA1, and CETP). Gene set enrichment analysis of the 139 genes highlighted the gene sets related to lipid metabolism and transport (Supplementary Fig. 4 and Supplementary Data 1), similar to the reported findings from gene set enrichment analysis of GWAS loci for LDL, HDL, and TG34. Tissue enrichment analysis of all 139 significant genes using either Human Protein Atlas (HPA) or Genotype-Tissue Expression (GTEx) as reference sets demonstrated enrichment of liver-specific genes (Supplementary Fig. 5), in accordance with a published tissue eQTL enrichment analysis across GWAS loci associated with LDL, HDL, TG, or total cholesterol35.

Among the 41 genes exclusively identified by MTAR tests, 27 (66%) genes have previously reported association evidence with at least one of the three lipid traits and 20 (74%) of them are associated with at least two lipid traits (Table 1). To replicate the associations of the genes without any existing annotation evidence, we applied the MTAR-O test to an independent UK Biobank GWAS data (Methods). Despite the fact that UK Biobank GWAS data usually harbor a smaller number of rare variants in a gene than GLGC exome chip data, 7 out of 11 (64%) genes were found significant in the UK Biobank at α = 0.05/11 = 4.5 × 10−3 (Table 1 and Supplementary Table 3). These seven validated MTAR discovered genes may have causal impact on the lipid traits. One example is PNPLA2, which encodes the enzyme adipose TG lipase (ATGL); ATGL is involved in the breakdown of TG. Although variants associated with PNPLA2 have not previously been directly linked with any of the three lipid traits in humans, ATGL-knockout mice display altered very-low-density lipoprotein, HDL, and TG levels36.

Simulation studies

We used simulation studies to further investigate the type I error control and power of MTAR. We considered three continuous traits that have similar residual covariances as the three lipid traits in the real data37. We simulated data in three cohorts (N1 = 3000, N2 = 3500, N3 = 2000) that have different patterns of sample overlap for the three traits (Supplementary Fig. 6). The details of genotype and phenotype simulations are provided in Methods.

As in the GLGC data analysis, we utilized the combined summary statistics across three cohorts for each trait (Methods). We first evaluated the empirical type I error rates based on 108 replicates of simulation. Prior research has shown that the accuracy of the Cauchy combined P-value is generally satisfactory for practical use in rare-variant association tests, but a slight inflation is possible30. Reassured that type I error was well controlled (Supplementary Table 4), we then proceeded to simulate traits under the alternative model to evaluate power. The percentage of causal variants was set to be 50% or 20% for scenarios of dense and sparse signals. For the causal variant j in trait k, the genetic effect was set to \(\beta _{kj} = s_j^{{\mathrm{snp}}} \times s_k^{{\mathrm{trait}}} \times d|{\mathrm{log}}_{10}{\mathrm{MAF}}_j|\), where \(s_j^{{\mathrm{snp}}}\) and \(s_k^{{\mathrm{trait}}}\) determined the heterogeneity of the effect directions among variants and traits, respectively, and d|log10 MAFj| stated that the effect size was larger for the variant with smaller MAF. We set different values of d for different percentage of causal and \(s_j^{{\mathrm{snp}}}\) settings such that the power of the tests in each setting is reasonably high. The effects among causal variants are either in the same direction (\(s_j^{{\mathrm{snp}}}\) = 1 for all j) or bidirectional (randomly assign 1 or −1 with equal probability to \(s_j^{{\mathrm{snp}}}\)). To run MTAR, we utilized the genetic correlations in GLGC data analysis for LDL, HDL, and TG, but we did not specify \(s_k^{{\mathrm{trait}}}\) according to their genetic correlations. In particular, we considered five patterns of \(s_k^{{\mathrm{trait}}}\) across traits: \((s_1^{{\mathrm{trait}}},s_2^{{\mathrm{trait}}},s_3^{{\mathrm{trait}}}) = (0,0,1)\); (0, 1, 1); (0, −1, 1); (1, −1, 1) and (1, 1, 1). All the association patterns across traits and variants considered in our power simulation are visualized in Supplementary Fig. 7.

The empirical power is estimated at the significance level of α = 2.5 × 10−6 based on 104 replicates (Fig. 5). When the gene is associated with one trait (pattern 1 of \(s_k^{{\mathrm{trait}}}\)), the single-trait-based tests (cctP and minP) are more powerful than iMTAR and cMTAR but the trend is reversed in other patterns. cMTAR is more powerful than iMTAR when the effects are homogeneous (pattern 5). iMTAR is much more powerful than cMTAR when the effects are heterogeneous and the specified genetic correlations are not informative to the true relationship of the effects among traits (pattern 2). The power of MTAR-O is close to the most powerful test in all scenarios. These observations are consistent with results from the GLGC data analysis.

Fig. 5: Power comparisons of MTAR-O, cMTAR, iMTAR, cctP, and minP.
figure 5

Each bar represents the empirical power for a method estimated as the proportion of P-values < 2.5 × 10−6 based on 104 replicates. The percentage of causal variants is set to be 20% or 50%, which corresponds to the two rows. The left column assumes the effects of the causal variants have the same direction, whereas the right column assumes the effect directions are randomly determined with an equal probability. The effect sizes (|βkj|’s) of the causal variants have a decreasing relationship with MAF as |βkj| = d|log10 MAFj|, where the constant d depends on the percentage of causal variants and the direction of their effects (the value of d is presented in each subfigure). For each configuration in a subfigure, five patterns of among-trait effects are considered.

Comparison with other multi-trait multi-variant methods

In comparison with existing multi-trait multi-variant methods (Supplementary Table 1), MTAR has a unique combination of features that make it desirable for practical use. First, MTAR uses summary statistics rather than individual-level data. MTAR starts with simple summary statistics calculated in a study for each trait: variant-level score statistics and their covariance estimates24. These statistics can be easily constructed using the information routinely shared in public domains38. Compared with methods that require pooling individual-level data, using summary statistics can better protect study participant privacy and reduce logistical difficulties and computational burden. Second, MTAR allows the summary statistics for different traits to come from (possibly unknown) overlapping samples. Failure to account for the correlation between summary statistics induced by the overlapping samples can greatly inflate type I error39. Sample overlap is prevalent in the multi-trait analysis. Sometimes the overlap pattern is clear (e.g., all traits are measured in the same study or in different studies that share controls40,41), but other times is often elusive—public domains only have combined summary statistics across many studies for each trait and study-specific summary statistics are not available7. MTAR can handle these scenarios and use a simple approach to accurately estimate the correlation between summary statistics for the traits with sample overlap. Third, MTAR is computationally fast. The MTAR P-value calculation is analytical and does not require time-consuming procedures such as permutation and Monte Carlo simulation.

We compared the power of MTAR with Multi-SKAT10 (MultiSKAT R package) and MTaSPUsSet17 (aSPU R package) in numerical studies. These two existing methods have demonstrated superior performance to other competing multi-trait multi-variant methods such as metaCCA18, MGAS19, DKAT11, MAAUSS13, MSKAT15, and GAMuT14. Similar to MTAR, Multi-SKAT and MTaSPUsSet proposed several tests to accommodate different patterns of associations across traits and variants, and omnibus tests to gain robustness. We compared their omnibus tests with MTAR-O. In the simulation study, we let all cohorts have complete trait values as it is required by Multi-SKAT. Empirical power was estimated at the α = 10−4 level due to the speed of MTaSPUsSet. MTAR-O has greater power than Multi-SKAT and MTaSPUsSet in almost all scenarios, especially when the genetic correlation reflects the heterogeneity of effects among traits (patterns 3–4 of \(s_k^{{\mathrm{trait}}}\)) (Supplementary Fig. 8). Furthermore, MTAR-O is computationally more efficient. Multi-SKAT and MTaSPUsSet, respectively, take 29 and 184 s on average to complete one replicate of simulation, whereas MTAR-O only takes 10 s. In addition, we applied MTaSPUsSet that does not require individual-level data to the GLGC summary statistics. MTaSPUsSet missed 52 MTAR identified genes, whereas MTAR only missed 9 MTaSPUsSet identified genes.

Discussion

We have introduced MTAR, a framework for conducting the meta-analysis of RVAS summary statistics across multiple traits. The cMTAR, iMTAR, and cctP tests cover a wide variety of association patterns among traits and variants. The omnibus test MTAR-O achieves robust and high power by combining the P-values of the three complementary tests. The use of summary statistics and Cauchy P-value combination method empowers MTAR to conduct whole-genome multi-trait RVAS in a computationally efficient manner. The computation time of running MTAR methods on the simulated and GLGC datasets are summarized in Methods. Our numerical results have confirmed that MTAR tests properly control the type I error in the present of complex patterns of sample overlap among traits and have substantial power gain relative to the separate analysis of RVAS for each trait. In the analysis of lipid traits in GLGC, MTAR identified many more genes than single-trait-based tests, including genes that have not been previously linked to lipid traits and represent novel findings. Many of these genes have been successfully replicated in an independent UK Biobank data.

Utilizing genetic correlations to guide the specification of gene-level effects heterogeneity across traits is one main innovation of MTAR. The genetic correlation is a genome-wide measure of the shared genetic architecture between a pair of traits and it is calculated using common variants across the genome. The GLGC data analysis results suggest that the rare-variant effect correlation among traits is generally in accordance with the genetic correlation for most genes and the use of genetic correlation in MTAR helps to substantially improve the power of the multi-trait analysis.

Although we mainly demonstrate MTAR in the analysis of continuous traits, the method can be applied to binary traits (Methods) as long as the score statistics from the models are unbiased and their covariance estimates are accurate. For binary traits, the normal approximation to the score statistics could be inaccurate in the unbalanced case-control setting42, which could affect the performance of the multi-trait analysis. For studies with related subjects, one may use methods based on mixed models to generate appropriate score statistics43,44. Future research is required on how to properly handle various patterns of sample overlap across traits in the presence of familial and cryptic relatedness.

With the increasing number of complex traits available in large-scale whole exome/genome sequencing studies and electronic health record linked biobank data, multi-trait analysis based on summary statistics of multiple rare variants will become an important tool to boost the power of discovering genetic components of complex traits and unravel their shared genetic architectures. We envision that MTAR will facilitate the accumulation of adequately large sample sizes to accelerate discoveries in complex trait genetics and provide new biological insights by revealing pleiotropic genes.

Methods

Covariance of genetic effects among variants

B1 is a m × m covariance matrix for the effects among variants. We set \({\mathbf{B}}_1 = {\mathbf{W}}_1{\mathbf{\Omega }}_1{\mathbf{W}}_1\), where W1 is a diagonal matrix with each element being a variant-specific weight and Ω1 is a between-variant effect correlation matrix of exchangeable structure with correlation coefficient ρ1 (0 ≤ ρ1 ≤ 1). Specifically, the single-trait analysis becomes SKAT20 if ρ1 = 0 and burden tests22,24,45,46 if ρ1 = 1. Burden tests are more powerful when the association effects are similar across the aggregated variants, whereas SKAT is more powerful when the effects are in opposite directions or the number of causal variants is small relative to neutral variants. As for the variant-specific weights (in W1), by default, we set them based on the MAF through a beta distribution density function Beta(MAF; 1, 25)  as in SKAT. Other weighting schemes can be employed as well.

Summary statistics

For each trait k (k = 1, …, K) and subject i (i = 1, …, n), when the individual-level phenotype (Yik), genotypes (Gik), and covariates (Xik) are available, the score statistics Uk and their covariance estimate Vk can be obtained from the generalized linear model with the likelihood function \({\mathrm{exp}}\left\{ {\frac{{Y_{ik}\left( {{\mathbf{\beta }}_k^{\mathrm{T}}{\mathbf{G}}_{ik} + {\mathbf{\gamma }}_k^{\mathrm{T}}{\mathbf{X}}_{ik}} \right) - b({\mathbf{\beta }}_k^{\mathrm{T}}{\mathbf{G}}_{ik} + {\mathbf{\gamma }}_k^{\mathrm{T}}{\mathbf{X}}_{ik})}}{{a(\phi _k)}} + c(Y_{ik},\phi _k)} \right\},\) where βk and γk are regression parameters, ϕk is a dispersion parameter, and a, b, and c are specific functions. Specifically, we have \({\mathbf{U}}_k = a(\hat \phi _k)^{ - 1}\mathop {\sum}\nolimits_{i = 1}^n {\{ Y_{ik} - b^{\prime} (\widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik})\} } {\mathbf{G}}_{ik}\) and \({\mathbf{V}}_k = a(\hat \phi _k)^{ - 1}[\mathop {\sum}\nolimits_{i = 1}^n {b^{\prime\prime} } (\widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik}){\mathbf{G}}_{ik}{\mathbf{G}}_{ik}^{\mathrm{T}} - \{ \mathop {\sum}\nolimits_{i = 1}^n {b^{\prime\prime} } (\widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik}){\mathbf{G}}_{ik}{\mathbf{X}}_{ik}^{\mathrm{T}}\} \{ \mathop {\sum}\nolimits_{i = 1}^n {b^{\prime\prime} } (\widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik}){\mathbf{X}}_{ik}{\mathbf{X}}_{ik}^{\mathrm{T}}\} ^{ - 1}\{ \mathop {\sum}\nolimits_{i = 1}^n {b^{\prime\prime} } (\widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik}){\mathbf{X}}_{ik}{\mathbf{G}}_{ik}^{\mathrm{T}}\}]\), where \(\widehat {\mathbf{\gamma }}_k\) and \(\hat \phi _k\) are the restricted maximum likelihood estimators of γk and ϕk under H0 : βk = 0, and b′ and b″ are the first and second derivatives of function b. For the linear regression model, we have \(a(\hat \phi _k) = n^{ - 1}\mathop {\sum}\nolimits_{i = 1}^n {(Y_{ik} - \widehat {\mathbf{\gamma }}_k^{\mathrm{T}} {\mathbf{X}}_{ik})^2}\), b′(z) = z, and b″(z) = 1. For the logistic regression model, we have a(\(\hat \phi _k\)) = 1, b′(z) = ez/(1 + ez), and b″(z) = ez/(1 + ez)2.

The Uk and Vk can also be derived from different forms of summary statistics shared in public domains38. When the score statistics Uk and their variances (i.e., diag(Vk)) are available, the covariance matrix of Uk can be approximated as Vk ≈ {diag(Vk)}1/2R{diag(Vk)}1/2, where \({\mathbf{R}} = \{ R_{j\ell }\} _{j,\ell = 1}^m\) is the SNP LD matrix calculated from the Pearson correlation coefficient among the genotypes of the m variants based on the working genotypes or external reference. In another case, when the effect estimates \(\widehat {\mathbf{\beta }}_k = \{ \hat \beta _{kj}\} _{j = 1}^m\) and their standard errors \({\mathbf{se}}_k = \{ se_{kj}\} _{j = 1}^m\) are available, we can approximate \({\mathbf{U}}_k = \{ U_{kj}\} _{j = 1}^m\) and \({\mathbf{V}}_k = \{ V_{kj\ell }\} _{j,\ell = 1}^m\) as \(U_{kj} \approx \hat \beta _{kj}{\mathrm{/}}se_{kj}^2\) and \(V_{kj\ell } \approx R_{j\ell }{\mathrm{/}}(se_{kj}se_{k\ell })\).

Covariance of summary statistics between traits

If all the traits are from the same study or multiple studies with overlapping samples, the summary statistics Uk among traits k = 1, …, K are correlated. Assume trait k is from cohort A with sample size nA and trait k′ is from cohort B with sample size nB, and there are nC overlapping subjects in these two cohorts. For any SNP j not associated with the traits, the correlation matrix of Z-score \(U_{kj}{\mathrm{/}}\sqrt {V_{kj}}\) among traits is invariant to SNP j39,47. In particular, if both traits k and k′ are quantitative, we have

$$\zeta _{kk^\prime } \equiv {\mathrm{cov}}\left(\frac{{U_{kj}}}{{\sqrt {V_{kj}} }},\frac{{U_{k^\prime j}}}{{\sqrt {V_{k^\prime j}} }}\right) \approx \frac{{n_C}}{{\sqrt {n_A} \sqrt {n_B} }}{\mathrm{cor}}(Y_k,Y_{k^\prime}).$$
(1)

If both traits k and k′ are binary, let nC0 (nC1) represent the number of overlapping samples with trait value of 0 (or 1), nA0 (nA1) denotes the number of subjects with trait k and takes the value of 0 (or 1) and nB0 (nB1) denotes the number of subjects with trait k′ and takes the value of 0 (or 1), then we have39

$$\zeta _{kk^\prime } \equiv {\mathrm{cov}}\left(\frac{{U_{kj}}}{{\sqrt {V_{kj}} }},\frac{{U_{k^\prime j}}}{{\sqrt {V_{k^\prime j}} }}\right) \approx \left( {n_{C0}\sqrt {\frac{{n_{A1}n_{B1}}}{{n_{A0}n_{B0}}}} + n_{C1}\sqrt {\frac{{n_{A0}n_{B0}}}{{n_{A1}n_{B1}}}} } \right){\mathrm{/}}\sqrt {n_An_B} .$$
(2)

Hence, we can accurately estimate ζkk using the independent null variants across the whole genome. Specifically, we first perform LD pruning using LD threshold r2 < 0.01 in 500 kb region to obtain a set of independent common variants. We then remove variants with association test P-values < 0.05 and only keep variants that are not associated with any traits. For any traits k and k′, we calculate the between-trait sample correlation of the Z-scores on the remaining variants and denote it as \(\hat \zeta _{kk^\prime }\). In our simulation study, we benchmarked \(\hat \zeta _{kk^\prime }\) against empirical sample covariance of Z-scores and confirmed the accuracy of the estimate \(\hat \zeta _{kk^\prime }\) (Supplementary Fig. 9). Finally, provided the gene is not associated with any trait, the covariance of \(\widehat {\mathbf{\beta }}_k\) and \(\widehat {\mathbf{\beta }}_{k^\prime }\) can be estimated using \(\hat \zeta _{kk^\prime }\)

$${\mathrm{cov}}(\widehat {\mathbf{\beta }}_k,\widehat {\mathbf{\beta }}_{k^\prime }) = {\mathrm{cov}}({\mathbf{V}}_k^{ - 1}{\mathbf{U}}_k,{\mathbf{V}}_{k^\prime }^{ - 1}{\mathbf{U}}_{k^\prime }) \approx \hat \zeta _{kk^\prime }{\mathbf{V}}_k^{ - 1}\{ {\mathrm{diag}}({\mathbf{V}}_k)\} ^{1/2}{\mathbf{R}}\{ {\mathrm{diag}}({\mathbf{V}}_{k^\prime })\} ^{1/2}{\mathbf{V}}_{k^\prime }^{ - 1},$$
(3)

where the matrix R is the SNP LD matrix defined in the previous subsection.

MTAR test statistics and P-values

We let \({\mathbf{\beta }} = ({\mathbf{\beta }}_1^{\mathrm{T}}, \ldots ,{\mathbf{\beta }}_K^{\mathrm{T}})^{\mathrm{T}}\) denote the m genetic effects across K traits and \(\widehat {\mathbf{\beta }} = ({\mathbf{V}}_1^{ - 1}{\mathbf{U}}_1, \ldots ,{\mathbf{V}}_K^{ - 1}{\mathbf{U}}_K)\) denote their effect estimates constructed from Uk and Vk. The MTAR framework assumes the hierarchical model

$$\widehat {\mathbf{\beta }}|{\mathbf{\beta }}\sim \cal{N}({\mathbf{\beta }},{\mathbf{\Sigma }}),\quad {\mathbf{\beta }}\sim \cal{N}({\mathbf{0}},\sigma {\mathbf{B}}).$$
(4)

As described in the main text, reflects the correlation due to the residual relatedness among traits in the presence of sample overlap and LD among variants, and B reflects the correlation among the rare-variant effects across traits and variants. The B matrix contains two coefficients ρ1 and ρ2, where ρ1 controls the effect correlation among variants and ρ2 controls the contribution of the genetic correlation to the among-trait rare-variant effect correlation.

For a fixed set of ρ = (ρ1, ρ2), we test H0 : σ = 0 against H1 : σ ≠ 0 by a variance-component score test48:

$$Q_\rho = \widehat {\mathbf{\beta }}^{\mathrm{T}}{\mathbf{\Sigma }}^{ - 1}{\mathbf{B}}{\mathbf{\Sigma }}^{ - 1}\widehat {\mathbf{\beta }}.$$
(5)

The test statistic follows a mixture of χ2 distribution under the null hypothesis. Davies method can be used to accurately estimate the P-value49. In addition, rare variants often show polymorphisms in some but not all traits, the adjustment of the formula for this case is described in the Supplementary Methods.

In the cMTAR and iMTAR tests (respectively correspond to two specifications of effect correlation among traits in B), the Cauchy P-value combination method is utilized to combine results from various ρ1 and ρ2. Similar to the minimum P-value method, the Cauchy method mainly focuses on a few smallest P-values30. The advantage of the Cauchy method over the minimum P-value method is that the Cauchy method is computationally fast because it does not rely on the Monte Carlo simulation to account for the correlation of the individual tests29. Specifically, the iMTAR or cMTAR test statistic is defined as

$$Q_{{\mathrm{iMTAR/cMTAR}}} = \frac{{\mathop {\sum}\nolimits_{\rho \in {\cal{S}}} {{\mathrm{tan}}\left[ {\left\{ {0.5 - p(Q_\rho )} \right\}\pi } \right]} }}{{|{\cal{S}}|}},$$
(6)

where p(Qρ) is the P-value of Qρ, \({\cal{S}}\) is a set that includes a grid of possible values of ρ = (ρ1, ρ2), and \(|{\cal{S}}|\) is the size of the set. In our implementation, we consider the grid {0, 0.5, 1} for both ρ1 and ρ2 such that there are nine combinations. We have shown in the Supplementary Fig. 10 that the GLGC analysis results are not sensitive to the choice of the grid. The P-value of QiMTAR/cMTAR can be accurately approximated by 0.5 − arctan (QiMTAR/cMTAR)/π29.

In addition, the MTAR framework reduces to single-trait analysis when we set a single diagonal element of matrix W2 to 1 (Fig. 1). We use the Cauchy method to combine these single-trait P-values from SKAT and burden tests and construct the cctP test as

$$Q_{{\mathrm{cctP}}} = \frac{{\mathop {\sum}\nolimits_{k = 1}^K {{\mathrm{tan}}} \left[ {\left\{ {0.5 - p_{{\mathrm{skat}},k}} \right\}\pi } \right] + \mathop {\sum}\nolimits_{k = 1}^K {{\mathrm{tan}}} \left[ {\left\{ {0.5 - p_{{\mathrm{burden}},k}} \right\}\pi } \right]}}{{2K}},$$
(7)

where pskat,k and pburden,k are the P-values from the SKAT and burden tests for trait k. The P-value of the cctP test can be approximated by 0.5 − arctan(QcctP)/π.

Finally, the Cauchy method is used to construct MTAR-O test by combining P-values from cMTAR, iMTAR, and cctP as

$$Q_{{\mathrm{MTAR-O}}} = \frac{{{\mathrm{tan}}\left[ {\left\{ {0.5 - p_{{\mathrm{cMTAR}}}} \right\}\pi } \right] + {\mathrm{tan}}\left[ {\left\{ {0.5 - p_{{\mathrm{iMTAR}}}} \right\}\pi } \right] + {\mathrm{tan}}\left[ {\left\{ {0.5 - p_{{\mathrm{cctP}}}} \right\}\pi } \right]}}{3},$$
(8)

where pcMTAR, piMTAR, and pcctP are the P-values of the cMTAR, iMTAR, and cctP tests. The P-value of the MTAR-O test can be approximated by 0.5 − arctan(QMTAR−O)/π.

Summary statistics from the GLGC

The summary statistics for the lipid traits were downloaded from http://csg.sph.umich.edu/abecasis/public/lipids2017/. For each trait k, the web portal contains variant-level genetic effect estimates \(\widehat {\mathbf{\beta }}_k\) for a given gene and their standard errors sek. We obtained Uk and Vk by using \(\widehat {\mathbf{\beta }}_k\) and sek as described in Summary statistics subsection of Methods. As the original genotypes from the study are not publicly available, we estimated the LD matrix R based on the genotypes of the European population from the NHLBI Exome Sequencing Project (ESP)37. To account for possible sample overlap among traits, we used Eq. (3) to estimate covariance among summary statistics across traits.

Gene set and tissue enrichment analysis

Gene set enrichment analysis was conducted using the one-sided hypergeometric test against Reactome Pathways and Gene Ontology Biological Processes, as implemented in the GENE2FUNC from FUMA50, with the genes tested in MTAR used as the background gene set. Enrichment P-values are adjusted for multiplicity using the Benjamini–Hochberg procedure within each set type tested; sets with adjusted P-value less than 0.05 are reported. Tissue enrichment analysis was conducted using TissueEnrich51, which implements the one-sided hypergeometric test for enrichment of user-defined genes relative to lists of tissue-enriched, tissue-enhanced, and group-enhanced genes. Default settings for the definition of tissue-enriched and enhanced genes from both GTEx and HPA RNA-seq datasets were applied. Enrichment P-values are adjusted for multiplicity using the Benjamini–Hochberg procedure within each reference set (GTEx and HPA).

Gene association annotation

We annotated the 41 genes exclusively discovered by MTAR in the GLGC data analysis using two recently developed databases: Open Targets52,53 (Supplementary Data 2) and STOPGAP54 (Supplementary Data 3). Open Targets and STOPGAP both link genes to a trait or disease via annotation of genomic loci detected in GWAS. For each of the 41 genes, the linked diseases are searched and filtered to the three traits: LDL, HDL, and TG, and the variant-disease association P-value < 5 × 10−8 from the two databases. In addition, the lipid association results from the Supplementary Tables 9 and 12 in the paper of previous GLGC data analysis33 were also used to annotate the 41 genes (Supplementary Table 5).

Replication of significant genes in the UK Biobank data

To replicate the associations of 14 genes (11 after removing genes with cumulative minor allele counts less than 10 in the UK Biobank GWAS data) exclusively identified by MTAR tests but without any annotation evidence, we applied the MTAR methods to an independent study with association summary statistics from the UK Biobank GWAS data set. The GWAS summary statistics were released by the Neale Lab with the re-release of UK Biobank genotype imputation (termed imputed-v3). The three related traits LDL direct (mmol/L), HDL direct (mmol/L), and TG (mmol/L) were jointly analyzed in a similar manner as the analysis of GLGC data.

Data simulation

For all simulations, we generated 100 haplotypes of length 1 MB under a calibrated coalescent model to mimic the LD structure and local combination rate of the European population55. These haplotypes were used to form the genotypes of 8500 subjects across three cohorts. To simulate the genotypes for a data set, we randomly selected one thousand 3 KB regions in each haplotype and focused on rare variants with MAF < 0.05.

For each subject i, three traits were generated based on a multi-response regression model

$$\left[ {\begin{array}{*{20}{l}} {Y_{i1}} \hfill \\ {Y_{i2}} \hfill \\ {Y_{i3}} \hfill \end{array}} \right] = \left[ {\begin{array}{*{20}{l}} {\beta _{11}} \hfill & \ldots \hfill & {\beta _{1m}} \hfill \\ {\beta _{21}} \hfill & \ldots \hfill & {\beta _{2m}} \hfill \\ {\beta _{31}} \hfill & \ldots \hfill & {\beta _{3m}} \hfill \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {G_{i1}} \\ \vdots \\ {G_{im}} \end{array}} \right] + 0.1X_{i1} + 0.2X_{i2} + \left[ {\begin{array}{*{20}{l}} {\epsilon _{i1}} \hfill \\ {\epsilon _{i2}} \hfill \\ {\epsilon _{i3}} \hfill \end{array}} \right],\quad \left[ {\begin{array}{*{20}{l}} {\epsilon _{i1}} \hfill \\ {\epsilon _{i2}} \hfill \\ {\epsilon _{i3}} \hfill \end{array}} \right]\\ \qquad\qquad\sim {\cal{N}}\left( {{\mathbf{0}},\left[ {\begin{array}{*{20}{c}} 1 & {0.1} & 0 \\ {0.1} & 1 & { - 0.1} \\ 0 & { - 0.1} & 1 \end{array}} \right]} \right),$$
(9)

where βkj is the genetic effect for trait k at variant j, Gij is the genotype at variant j, Xi1 is a binary covariate simulated from Bernoulli(0.5), Xi2 is a continuous covariate simulated from a standard normal distribution. The covariance matrix of the error term used here is based on the estimated residual correlations among the lipid traits LDL, HDL, and TG in the ESP data37. The reduced model was used when we needed to generate only one or two traits for subject i.

Computation time

We estimated the computation time of MTAR tests by considering different numbers of variants m = 5, 10, 20, 50, 100 and traits K = 3, 6, or 9 (Supplementary Fig. 11). For each scenario, we generated 50 datasets and reported the average computation time. On average, MTAR-O, cMTAR, and iMTAR took less than 0.11, 0.06, and 0.05 s (2.4 GHz Intel Core i5, Produced by Intel Co., Santa Clara, CA) when applied to a data set with 20 variants and 3 traits. The computation time did not change much in the presence of sample overlap; but it increased to 1, 0.51, and 0.49 s when the number of traits was increased to 9. MTAR is scalable for genome-wide analysis. Analyzing the GLGC data (15,378 genes) using MTAR-O, cMTAR, and iMTAR took about 25, 10, and 8 h on a laptop with a single core. After the computation jobs were distributed to multiple cores by chromosome, the analysis was finished within 2 h.

Web resources

SKAT R package v1.3.2.1: https://cran.r-project.org/web/packages/SKAT MultiSKAT R package v1.0: https://github.com/diptavo/MultiSKAT aSPU R package v1.48: https://cran.r-project.org/web/packages/aSPU FUMA v1.3.5: http://fuma.ctglab.nl TissueEnrich v1.8.0: https://tissueenrich.gdcb.iastate.edu Open Targets: https://genetics.opentargets.org STOPGAP: https://github.com/StatGenPRD/STOPGAP.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.