Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies

Li, Zhengbang; Zhang, Wei; Pan, Dongdong; Li, Qizhai

doi:10.1038/srep26243

Download PDF

Article
Open access
Published: 18 May 2016

Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies

Zhengbang Li¹^na1,
Wei Zhang²^na1,
Dongdong Pan³ &
…
Qizhai Li²

Scientific Reports volume 6, Article number: 26243 (2016) Cite this article

4 Citations
Metrics details

Subjects

Abstract

Principal component analysis (PCA) is a useful tool to identify important linear combination of correlated variables in multivariate analysis and has been applied to detect association between genetic variants and human complex diseases of interest. How to choose adequate number of principal components (PCs) to represent the original system in an optimal way is a key issue for PCA. Note that the traditional PCA, only using a few top PCs while discarding the other PCs, might significantly lose power in genetic association studies if all the PCs contain non-ignorable signals. In order to make full use of information from all PCs, Aschard and his colleagues have proposed a multi-step combined PCs method (named mCPC) recently, which performs well especially when several traits are highly correlated. However, the power superiority of mCPC has just been illustrated by simulation, while the theoretical power performance of mCPC has not been studied yet. In this work, we attempt to investigate theoretical properties of mCPC and further propose a novel and efficient strategy to combine PCs. Extensive simulation results confirm that the proposed method is more robust than existing procedures. A real data application to detect the association between gene TRAF1-C5 and rheumatoid arthritis further shows good performance of the proposed procedure.

Simultaneous estimation of bi-directional causal effects and heritable confounding from GWAS summary statistics

Article Open access 14 December 2021

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

Article Open access 28 February 2023

Gene selection by incorporating genetic networks into case-control association studies

Article 19 December 2022

Introduction

Identification of genetic variants associated with human complex diseases can help investigators further understand genetic structure of diseases of interest. Compared with single-marker analysis, which tests every marker individually and is commonly employed in genome-wide association study, multiple-marker test has been well appreciated because of its potentially improved statistical power. Statistical methods for multiple-marker analysis can be summarized as synthesizing single-marker test statistics such as Hotelling’s T² test^1,2,3 and summation of squared univariate test^4,5, weighted Fourier transformation⁶, variance-components score test⁷, principal components regression method^8,9,10, and Kernel-machine-based test¹¹. Performances of these methods have been explored by intensive computer simulations^1,12,13. Their results showed that when the number of SNPs is relatively large, variance-component-based methods and principal components regression methods were found to have competitive power.

As it is well known that principal component analysis (PCA) is a useful tool to search for important characteristics among correlated variables. A key issue in developing an effective PCA model is choosing an adequate number of principal components (PCs) to represent the system in an optimal way. Taking advantage of the size of variances, Hocking¹⁴ provided a firm rule for retaining PCs in the framework of regression models. Usually, in PCA, investigators only used a few top principal components and discarded the other PCs. Recently, some investigators have illustrated that commonly used method for choosing PCs is not always reasonable. In fact, as early as in 1982, Jollife¹⁵ showed an interesting counter-intuitive phenomenon that principal components explaining a small amount of variances can be as important as those explaining a large amount of variances when analyzing non-genetic data. Aschard et al.¹⁶ confirmed this phenomenon when analyzing genetic data and proposed a called multi-step combined principal component (mCPC) strategy. However, the performance of mCPC strongly depends on how to partition all PCs.

Without loss of generality, suppose a random vector T follows a multivariate normal distribution with a m × 1 mean vector μ and known m × m covariance matrix V. We want to test the null hypothesis H₀: μ = 0. Therefore, a Chi-squared statistic can be used for testing H₀. However, when m is large, which is fairly common in genome-wide association studies, Chi-squared test might substantially lose power due to its large degrees of freedom. To reduce degrees of freedom, PCA is recommended. Based on orthogonal decomposition, we have V = QΛQ^τ, where with , , and q_i is called as the eigenvector corresponding to the eigenvalue λ_i, . Define for . We note that Z_i is related to the ith PC for . Under H₀, , a central Chi-squared distribution with 1 degree of freedom for . Under the alternative hypothesis, , a noncentral Chi-squared distribution with 1 degree of freedom and non-centrality parameter , for .

For simplicity with m = 2, we consider a linear model with a normally distributed phenotype Y, which depends on two scaled genotypes G₁ and G₂ that are also normally distributed with mean 0 and variance 1. So the phenotype can be expressed as: Y = β₀ + G₁β₁ + G₂β₂ + ε, where ε is the random error term which is distributed from the standard normal distribution. For this general model, the principal components of these two genotypes are , and . By some algebras, we can get , and . Phenotype Y can be reexpressed as: , which indicates that PC₂ may be very important as is large, although the variance of PC₂ is less than that of PC₁. So we can not discard any PCs arbitrarily. In order to test H₀: μ = 0, Aschard et al.¹⁷ proposed a multi-step combined principal component (mCPC) as following , where F_k (·) is cumulative distribution function of a central Chi-squared random variable with k degrees of freedom. Moreover, they used simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits, and showed that their method with combining the signals across all PCs could have greater power. However, there has not been an in-depth study of the theoretical properties of mCPC in Aschard et al.’s paper¹⁶. Obviously, Aschard et al.¹⁶ find an unusual way to fully utilize all PCs. Another key issue is to decide the value of k. A commonly used method for selecting k is based on cumulative contribution rates, which are equal to , and denoted by c_k for , respectively. Let , for any c ∈ [0, 1]. Aschard et al.¹⁶ followed the traditional way to use mCPC (k) with k being determined by cumulative contribution rate of 80%.

In this work, we focus on the theoretical power of mCPC and find that the maximum power of mCPC is related to the maximum noncentral parameters under alternative hypothesis. We also find that the noncentral parameter corresponding to the top PC (the first PC which corresponds to the largest eigenvalue) is greater than 0 under most scenarios and those of other PCs do not possess this property when only a few means of all PCs are non-zero under alternative and the correlation coefficients among original variables are relatively large. Herein we propose a method tCPC. Based on numerical results, the tCPC is more powerful than the existing procedures under most of the considered scenarios.

Results

Theoretical Properties of mCPC (k)

For the multiple genetic variants association studies, the above random vector can be written as , where T_i is the statistic that is used to test for the association between the phenotype of interested and the ith genetic variants, . V is the covariance matrix of the random vector T. Through the eigen-decomposition of the covariance matrix, we have V = QΛQ^τ, where with , , and q_i is the eigenvector corresponding to the eigenvalue λ_i, . Then we can obtain transformed statistics as , . Furthermore, under H₀, Z_i follows a central Chi-squared distribution with 1 degree of freedom for . Under the alternative hypothesis, , a noncentral Chi-squared distribution with 1 degree of freedom and non-centrality parameter Ω_i, .

For , let be the inverse function of F_i (·). Note that for any given x ∈ [0, 1], . Under H₀, both , and follow uniform distribution on [0, 1] and they are independent to each other. So both , and follow Chi-square distributions with 2 degrees of freedom, then mCPC (k) follows a central Chi-squared distribution with 4 degrees of freedom.

According to Sankaran¹⁷, probability density function of a noncentral Chi-squared distribution with d degrees of freedom and non-centrality parameter ξ is . Denote , and . For any x > 0, and , the probability density function of is

where , and the probability density function of is

with .

Let C_1−α be 1 − α quantile of a central Chi-squared distribution with 4 degrees of freedom. The power of mCPC (k) under the significance level α is

Based on the above notations, the non-centrality parameter of the distribution of Z₁ which corresponds to the first PC is , where μ is the mean vector of T under the alternative hypothesis. Ω₁ = 0 if and only if the mean vector μ belongs to the space that expanded by the other m − 1 eigenvectors , that is , are m − 1 real numbers. However, for a m-dimensional space, . Hence, the non-centrality parameter of the Chi-squared distribution of the statistic Z₁ is not equal to 0 almost everywhere. Besides this, and since the top PC possesses the largest variation among all PCs and k = 1 is a boundary point of the set consisting of , herein we propose to use the following strategy (named tCPC) to combine all PCs

Under null hypothesis of no association at any locus, tCPC follows a central Chi-squared distribution with 4 degrees of freedom.

Simulation Settings and Numerical Results

In this subsection, we conduct simulation studies to compare powers between tCPC to some exiting approaches such as Hotelling’s T² test (HT)^1,2,3, ordinary PCA , summation of squared univariate test statistic (SSU)⁴, sequence kernel association test (SKAT)¹¹ and multi-step combine principal component test mCPC (k_0.8)¹⁷.

Consider testing association between m genetic variants (or SNPs) and a complex human disease. Let be a test statistic, where τ means the transpose of a vector or matrix. For example, we can construct T using the method in Chatterjee et al.¹⁸ to detect genetic association between m SNPs and a binary trait as

where g_i,j denotes the genotype of the jth SNP for the ith individual and n₁ and n₂ are the sample size of the case group and control group, respectively. Under the null hypothesis that these m SNPs are not associated with the disease of interest, T follows a multivariate normal distribution N (μ_m×1, V_m×m) asymptotically with the mean vector μ_m×1 and covariance matrix , in which V_G is the pooled-sample covariance matrix of all SNPs.

In order to obtain T and V_m×m, we first generate a latent vector with length of 20 from a multivariate normal distribution with covariance structures of a compound symmetry with equal pairwise correlation ρ. Then, this latent vector is dichotomized to yield a haplotype with predesignated minor allele frequency (MAF). We repeat the above process 100,000 times to form a large population. Without loss of generality, we designate the first SNP as disease-causal SNP with MAF being p, and other SNPs as noncausal SNPs with MAFs all being q. Both sizes of case samples and control samples are set to be 1,000. Case or control status of one subject is generated from a logistic regression model

with β₀ = −1.5, β₁ ∈ {ln (1.4), ln (1.6)} denoting the log odds ratio for the disease-causal SNP, and denoting log odds ratios for the non-causal SNPs, where Y_i = 1 or 0 represents disease or healthy status of the ith individual, . The nominal significance level is 0.05 throughout the whole simulation, and the number of replicates is 1,000. All parameter settings and their relevant results are displayed in Table 1. Table 1 shows that all these tests can control type I error rate correctly. For example, when the correlation coefficient of these 20 SNPs are equal to ρ = 0.50, β₁ = 0 and p = q = 0.20, the empirical type I error rates of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8), and tCPC are 0.051, 0.052, 0.043, 0.049, 0.051, and 0.045, respectively. The results of power comparison shows that tCPC performs more robustly than the other methods. For example, when the correlation coefficients of these 20 SNPs are uniformly equal to 0.20 and β₁ = ln 1.4, p = q = 0.20, the empirical powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8), and tCPC are 0.754, 0.736, 0.772, 0.766, 0.700, and 0.766, respectively. The empirical power of tCPC is a little lower than that of SSU in this scenario. However, when ρ = 0.50, β₁ = ln 1.4, and p = q = 0.20, the empirical powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8), and tCPC are 0.724, 0.756, 0.810, 0.816, 0.702, and 0.856, respectively. It is obvious that tCPC performs the best among all the considered procedures in this setting.

Table 1 Empirical type I error rates and powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8) and tCPC for Constant correlations.

Full size table

Next we consider a decreasing correlation structure. As a preliminary step, a latent vector with length of 20 is generated from a multivariate normal distribution with covariance matrix being (ρ^|i−j|)_20×20. Other simulation settings are similar as above and are shown in Table 2. As presented in Table 2, the empirical powers of all the tests are close to the nominal significance level which indicates that they can control type I error rate correctly. For instance, when ρ = 0.50, β₁ = 0 and p = q = 0.20, the empirical type I error rates of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8), and tCPC are 0.036, 0.032, 0.04, 0.044, 0.038, and 0.042, respectively. For power comparison, tCPC still performs more robustly than the other methods. For example, when correlations of these 20 SNPs are decreasing with distance as ρ = 0.80, β₁ = ln 1.4, p = 0.20, and q = 0.20, the empirical powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8), and tCPC are 0.77, 0.846, 0.772, 0.762, 0.788, and 0.836, respectively. The empirical power of tCPC is a little lower than that of oPC (k_0.8) in this case. However, when ρ = 0.95, β₁ = ln 1.4, p = 0.20, and q = 0.20, the powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8) and tCPC are 0.746, 0.898, 0.858, 0.858, 0.86 and 0.864, respectively. It indicates that tCPC gives the maximum power among the six methods in this scenario. Compared Tables 1 and 2 comprehensively, we can see that, when linkage disequilibrium extents among all SNPs are relatively strong, tCPC performs more robustly than existing statistical methods.

Table 2 Empirical type-1 error rates and powers of HT, oPC (k_0.8), SSU, SKAT, mCPC (k_0.8) and tCPC for decreasing correlations.

Full size table

Applications to gene TRAF1-C5 associated with Rheumatoid Arthritis

We apply tCPC and the other five existing tests to detect the association between gene TRAF1-C5 and rheumatoid arthritis using the data from the Genetic Analysis Workshop 16¹⁹. Our goal is to detect whether there is an association between gene TRAF1-C5 and rheumatoid arthritis. This gene has been reported to be deleterious previously²⁰. There are 2,062 subjects including 868 cases and 1,194 controls in this study. The gene TRAF1-C5 consists of 38 SNPs. The p-values of HT, oPC (0.8), SSU, SKAT, mCPC (k_0.8) and tCPC of detecting associations between gene TRAF1-C5 and rheumatoid arthritis are 5.21 × 10⁻⁵, 7.58 × 10⁻³, 5.95 × 10⁻⁴, 6.50 × 10⁻⁵, 7.56 × 10⁻⁵ and 3.75 × 10⁻⁵, respectively. If we use the p-value threshold of 5 × 10⁻⁵ as the moderate association at the genome-wide level as Burton et al.²¹, only the proposed tCPC can detect the moderate-strong association signal between the gene TRAF1-C5 and rheumatoid arthritis.

Discussion

Principal component analysis is a common tool to grasp important features of correlated variables and has been applied in genetic association studies. In principal component analysis, cumulative contribution rate of 80% or 90% is commonly adopted to choose PCs. However, this adoption is not always suitable since PCs with low contribution rate might be much more strongly correlated with the outcome than those with large contribution rate. To overcome this drawback, a mCPC method was developed recently¹⁶. In this study, we explored theoretical powers of mCPC deeply and find out that the maximum power of mCPC depends on the maximum noncentral parameters of Chi-squared distributions for all PCs under the alternative hypothesis. However, it is difficult to obtain this information beforehand in practice. In view of this, we propose a novel and robust strategy to combine PCs. We also propose a test for genome-wide association studies and compare powers of this test to mCPC (k_0.8) and some other existing procedures such as Hotelling’s T² test (HT), oPC (k_0.8) SSU and SKAT by extensive simulations. All simulation results show that our proposed procedure is more robust than mCPC, HT, oPC (k_0.8), SSU and SKAT. Results of real data analysis further demonstrates good performances of our proposed test. We suggest researchers to employ our robust strategy when they consider using principal component analysis method in the future.

It should be noted that our proposed procedure is built upon test by Chatterjee et al.¹⁸ which was designed to detect association between a marker and a binary trait. It can be easily extended to other application fields. For instance, it has been used in pleiotropic genetic study to identify deleterious genetic variants associated with multiple traits¹⁶. In addition, our proposed test can also be used to detect the association between genetic variants and quantitative traits in framework of linear model, and ordinal traits on basis of proportional odds model. If quantitative traits do not follow normal distribution, one can consider constructing a multivariate nonparametric trend test²² and then employ our proposed strategy to combine them.

Methods

Maximum Powers of mCPC and ordinary PCA over extensive scenarios

For fixed m, the powers of mCPC (k) mainly depend on k, μ and V_m×m. We set different mean vectors under the alternative hypothesis among different covariance matrices V_m×m. We also consider two types of V_m×m: one is that m-dimension variables are uniformly correlated, which means covariance matrix V_m×m is a symmetry positively definite matrix with diagonal elements all being 1 and non-diagonal elements all being ρ; the other is that all correlations among these m variables are decreasing considering the “physical” distance (SNP location), which means V_m×m = (ρ^|i−j|)_m×m. Without loss of generality, ρ is chosen to be 0.8 for strong linkage disequilibrium and 0.2 for weak linkage disequilibrium. Here, we consider m = 20. Note that, a test based on ordinary PCA can be gained, which is denoted by oPC (k) with , where . Obviously, powers of oPC (k) are also affected by k when are given. In order to view powers of mCPC (k) and oPC (k) comprehensively, we set α to be 0.05, and calculate powers of mCPC (k) and oPC (k) by numerical integration in R software under scenarios S1 to S16. All parameter settings about scenario S1 to S16 are displayed in Table 3. We calculate eigenvalues of V_m×m, c_i, Ω_i of all scenarios S1 to S16 for , and display all results in Tables 4 and 5. All power results of mCPC and ordinary PCA are displayed in Figs 1, 2, 3, 4. Under the same correlation structure and mean vector, the powers of mCPC (k) and oPC (k) are affected strongly by selection of k. From Tables 3, 4, 5 and Figs 1, 2, 3, 4, we can find that both the maximum powers of mCPC (k) and oPC (k) are related to the maximum non-centrality parameters of all PCs as k is from 1 to 20. For example, in Scenario S15, the non-centrality parameter of the second PC is the maximum among those of all PCs, and mCPC (2) has the maximum power. In another example, under the scenario S2, the non-centrality parameter of the third PC is the maximum, mCPC (4) has the maximum power, and powers of mCPC (3) and mCPC (4) are close. We can also find out that ordinary way to select k to construct mCPC (k) is not always desirable. For example, in Scenario S12, k_0.8 = 5, but oPC (5) has power as low as 0.115. It is verified that mCPC is more desirable than ordinary PCA based on Figs 1, 2, 3, 4. It is also verified that the selection of k according to the cumulative contribution rate is not robust. One can just follow the common adoption and choose c = 80% or 90% for oPC (k_c), but it will result in loss of power substantially under some situations. Furthermore, we draw a conclusion that mCPC (k_c) performs more robust than oPC (k_c) similar, since mCPC (k_c) has reasonable powers over all the considered scenarios. For example, in Scenario S2, these 20 variables are uniformly correlated with ρ = 0.8 and , , the power of oPC (k_0.8) is 0.073, which is far less than the power of mCPC (k_c), which is 0.57.

Table 3 Parameter settings about means and covariance matrices.

Full size table

Table 4 Eigenvalues, cumulative contribution rates and non-centrality parameters for Scenarios (S1) to (S8).

Full size table

Table 5 Eigenvalues, cumulative contribution rates and non-centrality parameters for Scenarios (S9) to (S16).

Full size table

A novel robust strategy to combine PCs

A further investigation of the maximum powers of mCPC (k) and oPC (k) shows that both of them are related to non-centrality parameters of the Chi-square distributions under the alternative hypothesis. For example, about scenario S16 in Table 5, the non-centrality parameters of all 20 PCs are 0.01, 0.05, 0.11, 0.19, 0.27, 0.37, 0.45, 0.53, 0.58, 0.61, 0.62, 0.60, 0.55, 0.48, 0.38, 0.30, 0.21, 0.12, 0.06 and 0.01 respectively, and non-centrality parameter being 0.62 which belongs to the 11th PC is the largest one among the non-centrality parameters of all 20 PCs. mCPC (10) takes the maximum power with 0.248 and the power of mCPC (11) is 0.247, which is very close to that of mCPC (10). The difference maybe are caused by numerical computing errors. The cumulative contribution rate of the top 10 PCs are 62.25%, which is much less than 80%. It is worth noted that the non-centrality parameters are determined by means and covariance matrix, which are hard to know in practice. Therefore, if we can know some prior information on means and covariance matrix, then the optimal strategy for selection of k become more prone to obtain. Aschard et al.¹⁷ proposed to use mCPC (k) with k being determined by cumulative contribution rate of 80%.

As shown above, using 80% cumulative contribution rate might not be a robust strategy, and it will give a very low power in some cases (e.g., Fig. 1). According to numerical results in Scenarios S1 to S16, we propose to use the tCPC method to combine all PCs.

Additional Information

How to cite this article: Li, Z. et al. Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies. Sci. Rep. 6, 26243; doi: 10.1038/srep26243 (2016).

References

Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered 56, 18–31 (2003).
Article Google Scholar
Xiong, M., Zhao, J. & Boerwinkle, E. Generalized T² test for genome association studies. Am J Hum Genet 80, 1257–1268 (2002).
Article Google Scholar
Fan, R. & Knapp, M. Genome association studies of complex diseases by case-control designs. Am J Hum Genet 72, 850–868 (2003).
Article CAS Google Scholar
Pan, W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol 33, 497–507 (2009).
Article Google Scholar
Li, Z. B., Yuan, A., Han, G., Gao, G. M. & Li, Q. Rank-based tests for identifying multiple genetic variants associated with quantitative traits. Ann Hum Genet 78, 306–310 (2014).
Article CAS Google Scholar
Wang, T. & Elston, R. C. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet 80, 353–360 (2007).
Article CAS Google Scholar
Tzeng, J. Y. & Zhang, D. Haplotype-based association analysis via variance-components score test. Am J Hum Genet 81, 927–938 (2007).
Article CAS Google Scholar
Gauderman, W. J., Murcray, C., Gilliland, F. & Conti, D. V. Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol 31, 383–395 (2007).
Article Google Scholar
Wang, K. & Abbott, D. A principal components regression approach to multilocus genetic association studies. Genet Epidemiol 32, 108–118 (2008).
Article Google Scholar
Zhang, F., Guo, X., Wu, S., Han, J., Liu, Y., Shen, H. & Deng, H. Genome-wide pathway association studies of multiple correlated quantitative phenotypes using principle component analyses. PLos One 7, e53320 (2012).
Article CAS ADS Google Scholar
Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J. & Lin, X. H. Powerful SNP-Set analysis for case-control genome-wide association studies. Am J Hum Genet 86, 929–942 (2010).
Article CAS Google Scholar
Ballard, D. H., Cho, J. & Zhao, H. Y. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet epidemiol 34, 201–212 (2010).
Article Google Scholar
Basu, S. & Pan, W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 35, 606–619 (2011).
Article Google Scholar
Hocking, R. R. The analysis and selection of variable in linear regression. Biometrics 32, 1–49 (1976).
Article MathSciNet Google Scholar
Jolliffe, I. T. A note on the use of principal components in regression. J R Stat Soc Ser C 31, 300–303 (1982).
Google Scholar
Aschard, H., Vilhjalmsson, B. J., Greliche, N., Morange, P. E., Tregouet, D. A. & Kraft, P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet 94, 662–676 (2014).
Article CAS Google Scholar
Sankaran, M. Approximations to the non-central Chi-square distribution. Biometrika 50, 199–204 (1963).
Article MathSciNet Google Scholar
Chatterjee, N., Chen, Y. H., Luo, S. & Carroll, R. J. Analysis of Case-Control association studies: SNPs, imputation and haplotypes. Stat Sci 24, 489–502 (2009).
Article MathSciNet Google Scholar
Amos, C. I., Chen, W. V., Seldin, M. F., Remmers, E. F., Taylor, K. E., Criswell, L. A., Lee, A. T., Plenge, R. M., Kastner, D. L. & Gregersen, P. K. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc 3, Suppl 7, S2 (2009).
Chen, L., Zhong, M., Chen, W. V., Amos, C. I. & Fan, R. A genome-wide association scan for rheumatoid arthritis data by Hotelling’s T² tests. BMC Proc 3, Suppl 7, S6 (2009).
Burton, P. R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Article CAS ADS Google Scholar
Zhang, W. & Li, Q. Nonparametric risk and nonparametric odds in quantitative genetic association studies. Sci Rep-UK 5, 12105 (2015).
Article CAS ADS Google Scholar

Download references

Acknowledgements

Z. Li is partially supported by National Nature Science Foundation of China (No. 11401240,11471135), and the self-determined research funds of CCNU from the colleges’ basic research of MOE (CCNU15A05038, CCNU15ZD011). D. Pan is partially supported by National Natural Science Foundation of China (No. 11301465), The Youth Program of Applied Basic Research Programs of Yunnan Province (No. 2013FD001) and the Young and Middle-aged Key Teachers Training Program of Yunnan University (No. XT412003). Q. Li is partially supported by National Nature Science Foundation of China, No. 61134013 and 11371353 and the Breakthrough Project of Strategic Priority Program of the Chinese Academy of Sciences, Grant No. XDB13040600.

Author information

Zhengbang Li and Wei Zhang: These authors contributed equally to this work.

Authors and Affiliations

School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
Zhengbang Li
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Wei Zhang & Qizhai Li
Department of Statistics, Yunnan University, Kunming, 650091, China
Dongdong Pan

Authors

Zhengbang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongdong Pan
View author publications
You can also search for this author in PubMed Google Scholar
Qizhai Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.L. and Q.L. conceived and designed the method and wrote the main manuscript text, W.Z. and D.P. conducted simulations. W.Z. contributed to the interpretation of all results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhengbang Li.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Li, Z., Zhang, W., Pan, D. et al. Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies. Sci Rep 6, 26243 (2016). https://doi.org/10.1038/srep26243

Download citation

Received: 12 January 2016
Accepted: 28 April 2016
Published: 18 May 2016
DOI: https://doi.org/10.1038/srep26243

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.