Introduction

To identify disease variants, genetic epidemiological methods perform association tests between single-nucleotide polymorphisms (SNPs) and phenotypes of interest. Most often, the number of SNPs range between hundreds in candidate gene studies to millions in genome-wide association studies. Not only can the genetic data be immense, but most often investigators analyze these SNPs with multiple phenotypes, which increases further the multiple testing problem. The number of phenotypes that are used in genetic association studies range from a few clinically relevant characteristics to tens of thousands that may originate from cellular expression measurements. With the use of multiple SNPs and phenotypes, the total number of statistical tests explodes and results in more and more stringent significance thresholds.

Typically when multiple SNPs and phenotypes are used in genetic analyses, the overarching hypothesis that researchers are interested in is whether a genetic variant has any relation to the specified disease of interest. The success of identifying these genetic variants with the marked increase in genetic data and the use of multiple phenotypes will depend to a large extent on the efficient statistical handling of these data. The immense amount of SNP and phenotype data must translate into increased statistical power to detect disease susceptibility loci.

A common approach to analyzing genetic data is to perform individual statistical tests for each SNP/phenotype combination and then adjust for the total number of tests using a multiple comparison correction method, such as Bonferroni,1 or less conservative corrections, such as the Holm2 or Hochberg3 correction. Such approaches are limited because the number of statistical tests remains unchanged. Therefore, only association findings that can withstand multiple comparison corrections will achieve genome-wide significance in scenarios even with a moderate number of SNPs and phenotypes. The number of total association tests becomes large quickly and make it unlikely that true genetic variants will be identified.

Other methods, including global haplotype tests,4 regression methods,5, 6 and multimarker tests,7 address multiple testing by combining multiple SNPs into a single test. Global haplotype tests enumerate the possible haplotypes and then evaluate the null hypothesis that no haplotype is associated with the disease. Regression-based approaches analyze associations between a trait and all linear combination of SNPs. Multimarker methods7 test multiple SNPs simultaneously by combining individual marker scores and pairwise correlations while avoiding the use of haplotype structure. Although all of these approaches dramatically reduce multiple testing over single SNP analyses, none of these methods have addressed the issue that multiple phenotypes are usually available and tested for association.

Statistical methods have also been developed to handle multiple phenotypes on a single SNP level. FBAT-PC8 is an approach implemented in the PBAT program9 that uses multiple phenotypes to construct an overall phenotype that amplifies the trait heritability at each SNP. The FBAT-PC methodology has been successful in identifying SNPs associated with complex diseases in genome-wide association studies, but has the distinct disadvantage that it requires all phenotypes to have similar distributional characteristics.

Although many tests have been developed to aggregate either the genotype or the phenotype information, few methods have been developed that can reduce the multiple testing problem by examining marker and phenotype information simultaneously. In this manuscript, we propose a new one degree of freedom test, omnibus family-based association test (MFBAT), that incorporates both multiple markers and multiple phenotypes into a single test statistic. By reducing the number of tests across two dimensions, the total number of statistical tests is reduced to one and the multiple testing problem is addressed more efficiently than other methods. This methodology is also flexible in nature and can incorporate phenotypes that have varying distributional properties. Although this test has been developed with candidate gene studies, cellular expression phenotypes, and genome-wide association analyses in mind, the testing strategy is applicable to any family-based association analysis that uses multiple markers and multiple phenotypes. In addition, the MFBAT approach can be used in the context of the PBAT screening algorithm10, 11 to further reduce multiple comparisons. Through simulation studies, we show that the MFBAT testing strategy outperforms standard methodology in terms of statistical power. We also provide a concrete example of the methodology using five phenotypes and five SNPs from an asthma clinical trial.

Methods

Multi-marker/multi-phenotype: global expansion of FBAT

The general methodology is that the FBAT approach compares the difference between the observed marker score and the expected marker score that is computed based on Mendelian transmissions, conditional on the parental genotypes and the offspring's phenotype.12 If the parental information is not available, the expected marker score is calculated conditional on the sufficient statistic for the pedigree that is then defined by the available genotypes on siblings and other relatives. As the genotype is the only random variable in the FBAT approach and its distribution is defined by Mendelian transmission, and not by any parameter estimates for the allele frequency, the FBAT approach is robust against confounding due to population admixture and stratification.

In this study we extend the FBAT methodology to generate an omnibus association test, MFBAT, that uses information on multiple markers and multiple phenotypes. For the sake of simplicity, we assume that there are trios, that is, probands and their parents, and that SNP data are analyzed. The proposed methodology extends easily to families with multiple siblings and missing parental data. If the parental data are missing/unavailable, the parental genotypes can be replaced below by the sufficient statistic5, 12 and the construction of the weights for the overall test that we will describe below is directly extensible. For a more detailed discussion of this, we refer to the original paper.12

We assume that there are n parent–offspring trios, and that each offspring has m coded genotypes, k=1,2,…,m, and p traits of interest l=1,2,…,p. Let xik be the coded genotype of the kth marker for the ith proband that can be coded in an additive, dominant, or recessive manner. The variables pik1 and pik2 denote the parental genotypes for the parents of the ith proband at the kth marker, yil denotes the lth trait of interest of the ith proband, and μl denotes the offset for that trait. Therefore, the FBAT statistic, FBATkl, tests for an association between the kth marker locus and lth trait and is given by the following:12

If all offspring are affected and an additive coding function is used for the genotype, the FBAT statistic and the original TDT statistic13 are equivalent.

Assuming that the FBAT statistic at each marker for each trait is FBATkl,∼N(0,1), then an omnibus FBAT statistic that tests all phenotypes and genotypes of interest simultaneously can be constructed by the linear combination of all FBAT statistics divided by the appropriate variance. The numerator for the MFBAT statistic is therefore given by:

Following the work of O'Brien et al14 and Wei and Johnson,15 the optimal weights, wkl, can be constructed as follows. Denote V as the covariance matrix of (FBAT11,…,FBATmp) that is calculated empirically and the vector v is defined by the expected FBAT statistic under the alternative hypothesis, that is, E(FBAT∣Ha)=[E(FBAT11∣Ha),…, E(FBATmp∣Ha)]. Then the weights for the optimal linear combination of FBAT-statistics, that is, the linear combination with the highest statistical power under Ha is given by

Specifically, W is a vector of weights [w11,…, wmp]t that are calculated from V and v. In the case in which the inverse variance matrix becomes unstable, we use the generalized inverse matrix.

Although in general, it is difficult to estimate E(FBAT∣Ha) before the computation of the test statistic, family-based studies allow for this estimation in a way that is statistically independent of the subsequent FBAT statistics that are computed using the conditional mean model. This method is described in detail elsewhere.16 We provide a brief summary in this study. The standard quantitative genetic model for the phenotypic mean for the kth genetic marker is given by

In this equation, μl is the overall mean for the lth phenotype and al is the additive genetic effect of the lth phenotype. In most cases, al can only be calculated using the actual genotypic data that are used in the statistical analysis itself. Family-based studies offer a unique situation in which we can generate an estimate for al from the data without biasing the subsequent test statistic. Because we compute the FBAT statistic using the offspring marker scores in the informative families, information from the noninformative families (ie, both parents are homozygous) can be used in the calculation of the expected value of the FBAT under the alternative hypothesis. Estimation on the sole basis of noninformative families is problematic for several reasons discussed elsewhere.16 To permit the use of both informative and noninformative families for al without biasing the resulting test statistic, we replace the marker score xik in Equation Eq. (4) by its expected value conditional on the parental genotypes of the ith proband at the lth phenotype and the kth genetic marker, E(Xik∣pik1,pik2). This equation is also called the conditional mean model.

It is important to note that these two are identical when the family is noninformative, because then the observed marker score xik and the expected marker E(Xik∣pik1, pik2) are identical, that is, xik=E(Xik∣pik1,pik2). As the test statistic is based on the use of offspring genotypes conditional on parental genotypes, the use of E(Xik∣pik1,pik2) to estimate al does not bias subsequent testing, even in the informative families. We can therefore use E(Xik∣pik1,pik2) in place of xik to generate an estimate of the FBAT statistic under the alternative hypothesis without biasing the subsequent statistical test. The conditional mean model has been specified similarly by Vansteen et al10 for continuous traits, Jiang et al17 for time-to-onset, and Murphy et al18 for affection status. Therefore, the most powerful combination of FBATs can be constructed as follows:

Alternatively, an omnibus test statistic could be constructed based on the multivariate score test, the FBAT-GEE,19 but the degrees of freedom for this test increases rapidly with increasing markers and phenotypes, that is, df=mp, making this test less optimal than MFBAT with one df proposed in this study. Similar to the approach proposed by Schaid et al,20 this approach also only uses one degree of freedom, which makes the resulting test statistic more powerful when multiple markers and/or phenotypes are added. There are three fundamental differences between the approach proposed by Schaid et al and the methodology described in this study: (1) we incorporate not only multiple markers into the test statistic, but also multiple phenotypes that the Schaid et al methodology does not do; (2) our approach is based on family data whereas the Schaid et al approach is based on case–control data and therefore does not incorporates quantitative phenotypes; and (3) the weightings used in our analysis are calculated using the expected offspring genotypes conditional on the parental genotypes, and such a calculation cannot be performed with case–control data.

Results

The simulation is designed around the asthma study discussed in the data analysis section of this paper. The markers of interest comprise a five-SNP haplotype modeled after five SNPs in the ARG1 gene. We generated the parental haplotypes by drawing from a uniform distribution, in which the probability that any parent has a given haplotype is the haplotypic frequency as measured in the Childhood Asthma Management Program (CAMP) population.21 The haplotypes of the probands are obtained by simulating Mendelian transmissions of the parental haplotypes, assuming complete linkage disequilibrium in each haplotype. For the computation of the MFBAT statistics, the genotypes of probands and their parents are assumed to be known.

We simulate 1000 trios with five phenotypes and five SNPs and then evaluate the power of the proposed testing strategy to other existing testing strategies. Using the haplotypes that were generated from these five SNPs in the CAMP population, the haplotypes with frequencies of 0.1, 0.2, and 0.3 are each selected to be the disease susceptibility loci, and the genotypic distribution under the alternative hypothesis is generated using E(Xi∣pi1,pi2) for the marker score as described by Lange and Laird.16 The haplotypes for the remaining SNPs are simulated under the null hypothesis, assuming Hardy–Weinberg equilibrium and complete linkage disequilibrium. Five phenotypes are generated, in which one phenotype is associated with one haplotype and the four remaining phenotypes are only associated with the haplotype by their correlation with the associated phenotype. Three different phenotypic correlations were used in the simulations: (1) a low phenotypic correlation, in which the phenotypic correlation between all phenotypes is 0.2; (2) a moderate phenotypic correlation, in which the correlation between each phenotype is 0.38, which reflects the average phenotypic correlation matrix of the five asthma phenotype measurements in the CAMP clinical trial; and (3) a high phenotypic correlation, in which the phenotypic correlation between all phenotypes is 0.8. The strength of the additive effect relative to the phenotypic variance is measured by the heritability, h2. Specifically, when the phenotypes are generated for the simulation, they are generated using a regression equation that results in the specified heritability measurement between the phenotype and genotype. The phenotypic vector Yi for each offspring is a random sample from a multivariate normal distribution, that is, Yi∼N([a1xi,…,a5xi],V), in which al is the additive effect for the lth phenotype, xi is the individual genotype, and V is the (5 × 5) variance matrix. We measure the strength of the additive genetic effect on a phenotypic trait by the heritability h2l,22 which is the proportion of phenotypic variation explained by the genetic variation – that is, h2l=Var(alXi)/Var(Yil).22 This expression for h2l can be solved for al,16 which is a measure of the genetic effect size.

We evaluate the power of the proposed testing strategy with other analysis approaches, including testing five phenotypes at five SNPs separately using the FBAT statistic (denoted as a single-SNP/single-phenotype test or SSSP), testing five phenotypes separately with a five-SNP haplotype using a family-based haplotype test (denoted as hap), testing five phenotypes at each SNP separately using the FAT-PC methodology (denoted as FBAT-PC), using MFBAT on each SNP but combining all five phenotypes (denoted as MFBAT-1S), testing all five SNPs with each phenotype individually using the MFBAT approach (denoted as MFBAT-1P), and testing all five SNPs and five phenotypes simultaneously using the MFBAT statistic (denoted as MFBAT). All multiple comparison corrections were made using the Bonferroni procedure with overall α=0.05. The R2 measures the degree of linkage disequilibrium between two SNPs, in which an R2 of 0 indicates that there is no linkage disequilibrium between two SNPs and an R2 of 1 indicates perfect linkage disequilibrium between two SNPs. When simulating the SNPs with an R2 of 0.7, we used the actual structure of the SNPs in the ARG1 gene that is used in the data application. These SNPs have the following haplotypic distributions:

  • Haplotype AATAT=0.534

  • Haplotype TGATT=0.303

  • Haplotype TATAT=0.108

  • Haplotype TGATC=0.028

  • Haplotype AGATT=0.019

  • Haplotype TGAAT=0.008

We assumed that the disease haplotype had frequencies of 0.10 (haplotype TATAT), 0.20 (we changed haplotype TGATT to have this frequency), and 0.30 (haplotype TGATT). An additive mode of inheritance was assumed in the simulations. The reported power estimates are based on 1000 replicates. The results of the simulations are shown in Table 1. Similarly, when simulating the haplotypes with an R2 of 0.2, we used the following haplotypes:

  • Haplotype AACAA=0.367

  • Haplotype TACAA=0.129

  • Haplotype TGTAA=0.100

  • Haplotype AGCGT=0.012

  • Haplotype AGTAA=0.066

  • Haplotype AGCAA=0.039

  • Haplotype TGTAT=0.038

  • Haplotype AATAA=0.029

  • Haplotype AACGT=0.026

  • Haplotype TGCGT=0.092

Table 1 Power simulations of the MFBAT methodology compared with the standard approaches using 1000 trios

Because we simulate the data using a five-SNP haplotype, presumably the statistical tests using all five SNPs would have the most power. In the simulations, we use all five SNPs together (MFBAT, MFBAT-1P, haplotype) and scenarios in which we only use one SNP at a time (MFBAT-1S, FBAT-PC). Because the simulations presented in the paper analyze the power using all five SNPs individually and together, we believe that we have captured the power that MFBAT approach has to detect the DSL in the two extreme situations, one in which what is being tested is exactly the DSL (ie, when all five markers are used simultaneously) and one in which we are testing only a part of the true DSL. The results of the simulations are shown in Table 1.

In all simulation scenarios, MFBAT or MFBAT-1S had the highest power estimates of the approaches that were used, and in many cases these approaches substantially outperformed both the haplotype and SSSP approaches. As MFBAT-1S and FBAT-PC are methodologically very similar, it is not surprising that these two approaches have comparable power estimate in all of the tested scenarios. When the R2 between the SNPs is 0.20, MFBAT is consistently the highest powered of the various testing strategies regardless of the haplotype frequency or the phenotypic correlation. In this scenario, MFBAT has between two and three times the power of the haplotype approach and over 10 times the power of SSSP. In fact, when the phenotypic correlation is 0.8 and R2 is 0.20, MFBAT is nine times as powerful as the haplotypic approach and over 60 times as powerful as SSSP. When the R2 between the SNPs is 0.70 and the phenotypic correlation is moderate (0.4) or low (0.20) with a minor allele frequency >0.2, then the power of FBAT-PC, MFBAT-IS, and MFBAT do not deviate notably from each other. The power of any of these three approaches is between 2 and 10 times higher than the overall haplotype test and SSSP. When R2 is high (0.7) and the phenotypic correlation is high (0.8), then MFBAT-1S and FBAT-PC substantially outperform the other approaches, with three times the power of MFBAT and approximately 30 times the power of the haplotypic approach or SSSP. In virtually all of the simulations, SSSP has the lowest power, which is not surprising because there are many comparisons that need to be adjusted for and the correlation between tests is not accounted for in any way. When the heritability is increased from 1 to 5%, the overall power increases, but the relative relationships among the statistical tests remains constant. In addition, simulations using an R2 of 0.5 show results in between 0.2 and 0.7 (data not shown).

We generated simulations under the null hypothesis of no genetic effect and calculated the empirical type I error rates for the MFBAT under three scenarios: (1) single phenotype and five SNPs, (2) five phenotypes and single SNP, and (3) five phenotypes and five SNPs. Simulations of 10 000 replicates were performed at minor allele frequencies of 10 and 30% and the α was set at 5.0%. These results were compared with FBAT-PC, in which accurate type I error rates has been established many times. In all cases, the type I error rate does not vary by more then 0.2% above or below 5.0%. This small deviation suggests that the different MFBAT scenarios do not vary from the selected alpha level in any meaningful or systematic way. Therefore, we conclude that the type I error is being maintained appropriately.

Data analysis: an asthma study (CAMP)

We applied MFBAT to a collection of parent/child trios in the CAMP Genetics Ancillary Study. The CAMP study randomized asthmatic children to three different asthma treatments.21 Blood samples for DNA were collected from 696 complete parent/child trios from 640 nuclear families in the CAMP Ancillary Genetics Study. Genotyping was performed on five polymorphic loci located in the ARG1 gene.23 Previously, an association with the ARG1 gene and bronchodilator response was identified and replicated in three independent populations.23 Therefore, we used this gene along with five phenotypes, including bronchodilator response and four measures of pulmonary function. The related phenotypes were pre- and post-bronchodilator measures of forced expiratory volume in one second (FEV1) and forced vital capacity (FVC). FEV1 is the amount of air that can be forcibly blown out of the lung in 1 s. FVC is the total amount of air that can forcibly be blown out of the lung after full inspiration. Therefore, the following phenotypes were used in the analysis: measurements of (1) bronchodilator response, (2) pre-bronchodilator response to FEV1, (3) post-bronchodilator response to FEV1, (4) pre-bronchodilator response to FVC, and (4) post-bronchodilator response to FVC.

FEV1 is known to be an important asthma phenotype that also depends on the age, height, weight, and sex of the individual. It is standard practice to adjust the FEV1 measurements for these covariates before analyzing them. Therefore, using the conditional mean model approach,16 we regressed the FEV1 and FVC measurements on the recorded values for age, height, weight, and sex and used the residuals in MFBAT.

Table 2 lists the results from the various analyses that were used in the power simulations sections. We list the statistical test, the SNPs analyzed, the phenotypes used, the observed P-values, and the significance level. The column labeled significance level refers to the significance level (ie, alpha) that is necessary when accounting for the multiple testing. When there is one statistical test, then the P-value must be <0.05 to be significant whereas when there are five statistical tests, then the P-value must be <0.01 to be significant.

Table 2 Analysis of ARG1 and five asthmatic phenotypes

All five SNPs in the ARG1 gene were also included in the analysis. The data were analyzed in six ways: (1) using MFBAT with all phenotypes and SNPs, (2) using MFBAT with all phenotypes and each SNP individually, (3) using MFBAT with all SNPs and each phenotype individually, (4) using the FBAT-PC association test for each SNP individually, (5) using a haplotype test for each phenotype individually, and finally (6) analyzing each SNP/phenotype combination individually. The results of these findings are summarized in Table 2. The minor allele frequency of the SNPs in this analysis ranged between 0.46 and 0.03, whereas the average phenotypic correlation was 0.38 and the average R2 was in between 0.2 and 0.7. In this example, significant associations are identified with FBAT-PC, MFBAT-1S, and MFBAT; however, the association findings for FBAT-PC and MFBAT-1S are not nearly as striking at low P-value observed using MFBAT (P-value=9.53 × 10−6).

From the data analysis we see that the MFBAT analysis, using all phenotypes and all SNPs simultaneously, has the strongest genetic association with a P-value of 9.5 × 10−6. When we analyzed the association P-values for individual SNPs and phenotypes, there is no specific phenotype/SNP combination that seems to be driving the effect. When analyzing the FBAT-PC and MFBAT-IS, it seems that rs2781659 and rs2781663 are driving the association. It also seems as if the pre- and post-FVC measures have the strongest association. When examined individually, however, the association is not as strong as when all SNPs and phenotypes are examined together. This is likely because the combinations of phenotypes are more informative than any single phenotype that is evaluated individually.

From the simulations described in Table 1, MFBAT would have the greatest power to detect an association, followed by MFBAT-1S and FBAT-PC. The haplotype and SSSP analyses would have the lowest power to detect an effect. In this analysis, MFBAT resulted in an overall P-value on the order of 10−6, which clearly shows a very strong association with the asthma-related phenotypes. Two SNPs were significantly associated with the phenotypes for both MFBAT-1S and FBAT-PC. As was observed in the simulations, these two test statistics have similar results and their association P-values track together closely. Both the haplotype analysis and SSSP had no significant associations after adjusting for multiple comparisons. This analysis illustrates that the MFBAT approach, which uses all of the information simultaneously, stands out as a possible association P-value that one would carry forward for further analysis.

Discussion

In this manuscript we propose a new testing methodology in which a test statistic, MFBAT, is generated by using information across multiple SNPs and phenotypes. This approach was developed to help reduce the overall number of statistical tests, a problem that has prohibited researchers from establishing genome-wide significance in many studies, most notably in genome-wide association analyses. The simulation studies suggest that MFBAT or MFBAT-1S have the most power when compared with standard methods, particularly when the correlation among SNPs is low and the phenotypic correlation is high. One advantage of the MFBAT approach is that it can easily incorporate multiple phenotypes with varying characteristics. With a group of affected parent–offspring trios, the clinical traits of interest are often both quantitative and qualitative in nature. Therefore, having a statistical test that can incorporate any combination of qualitative and quantitative traits offers a distinct advantage over other statistical tests that can only incorporate either quantitative or qualitative measures. This is a great advantage over FBAT-PC, which has similar power estimates to MFBAT-1S, but is restricted in using quantitative phenotypes with distributions similar to each other. To date, we know of no other statistical test for family data that can incorporate this type of phenotypic data in such an efficient manner.

It is important to point out that when using the MFBAT test, we will be identifying a genetic association among a group of genetic markers and phenotypes. That is, we are not testing any single phenotype or single SNP directly. Once a significant association is found with multiple SNPs and phenotypes, the approach by which to proceed may be divided between two differing philosophies. The first philosophy would seek to identify the single-SNP/phenotype combination that is largely responsible for the effect. In this case, one may deem it imperative to determine what specific SNP/phenotype combination is driving the effect. There are several ways to determine this. First, one can analyze the weightings that are applied for each phenotype/genotype combination in generating the overall test statistic. Second, once a significant association is determined, individual tests can be performed to determine what phenotypes and genotypes are driving the effect, similar to what is performed for analysis of variance tests in classical statistics. The second philosophy regarding a significant association while using multiple SNPs and phenotypes argues something completely different. This perspective would suggest that the optimal phenotype is not any single one of the phenotypes that were used, but that a combination of the selected phenotypes best describes the clinical condition that is being analyzed. Similarly, this philosophy would suggest that it is not likely to be any single SNP that is driving the association, but that several of SNPs are in linkage disequilibrium with an unknown function genetic variant responsible for the observed effect. Therefore, if one's beliefs fall into this philosophy, then the next step would be to perform subsequent statistical and genetic testing on the entire region in which the association effect was identified.

In conclusion, we propose a new statistical test, MFBAT, that can be used to test multiple phenotypes and genetic markers simultaneously, thereby reducing the multiple testing problem. This test has good statistical power, is easy to use, flexible in nature, and is incorporated into the PBAT program.