Optimal Trend Tests for Genetic Association Studies of Heterogeneous Diseases

Lee, Wen-Chung

doi:10.1038/srep27821

Download PDF

Article
Open access
Published: 09 June 2016

Optimal Trend Tests for Genetic Association Studies of Heterogeneous Diseases

Wen-Chung Lee¹

Scientific Reports volume 6, Article number: 27821 (2016) Cite this article

3304 Accesses
4 Citations
Metrics details

Subjects

Abstract

The Cochran-Armitage trend test is a standard procedure in genetic association studies. It is a directed test with high power to detect genetic effects that follow the gene-dosage model. In this paper, the author proposes optimal trend tests for genetic association studies of heterogeneous diseases. Monte-Carlo simulations show that the power gain of the optimal trend tests over the conventional Cochran-Armitage trend test is striking when the genetic effects are heterogeneous. The easy-to-use R 3.1.2 software (R Foundation for Statistical Computing, Vienna, Austria) code is provided. The optimal trend tests are recommended for routine use.

Causal machine learning for predicting treatment outcomes

Article 19 April 2024

Genome-wide association studies

Article 26 August 2021

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Article Open access 30 April 2024

Introduction

Genetic factors contribute to many human diseases, conferring susceptibility or resistance. Unlike simple Mendelian disorders, more common complex diseases may have many genes involved in their pathogenesis^1,2,3. The association of candidate genes (or markers across the genome) with a disease can be efficiently evaluated by a case-control design, in which genotype frequencies are compared for diseased cases and unaffected controls. Genetic association studies are the important first step of gene characterization. Candidate genes or markers found to be statistically significant are then subject to further studies, to identify causal variants, to quantify genetic effects, to examine possible gene-environment or gene-gene interactions and so on^4,5,6,7; results from different studies can also be pooled for a meta-analysis^8,9,10. The Cochran-Armitage trend test^{11,12,13,14,15} has become a standard procedure in this crucial first-step study of complex diseases. It is a directed test most sensitive to detecting genetic effects that follow the gene-dosage model.

However, a disease may comprise more than one disease entity, each with a different etiology, clinical picture and prognosis. Examples of such heterogeneous diseases are Alzheimer’s disease¹⁶, breast tumors¹⁷, B-cell lymphoma¹⁸, acute lymphoblastic leukemia¹⁹, primary thyroid lymphoma²⁰, otosclerosis²¹, rheumatoid arthritis²² and autism spectrum disorder¹. The effect of a gene associated with a heterogeneous disease can be variable, depending on which disease entity one is considering; and if the distinct disease entities themselves, often obscure and subtle, are not recognized and taken into account, the genetic effect associated with the heterogeneous disease at large may vary from person to person.

Genetic heterogeneity can complicate our association study of complex diseases even further. The following hypothetical example should highlight this issue. Consider the disease occurrences in a population of one million people (250,000 people with genotype aa; 500,000 people with genotype Aa; 250,000 people with genotype AA). Assume that the disease under study has two distinct subtypes (which are unknown to researchers). Further assume that both subtypes conform strictly to the gene-dosage model. For Subtype I, the disease risk is 0.0001 for the aa genotype and the risk increases ten-fold per A allele; for Subtype II, the disease risk is 0.0020 for the aa genotype and the risk decreases two-fold per A allele. A simple calculation shows that the majority (73%) of the diseased subjects in this population are of Subtype I (where the risk increases ten-fold per A allele), so the A allele should be regarded as a risk allele rather than a protective one. Yet, ignoring the subtypes, we observe disease risks of 0.0021 (aa genotype), 0.0020 (Aa genotype) and 0.0105 (AA genotype), respectively. This is nothing like a gene-dosage model and moreover, the A allele now appears protective, when comparing the Aa and the aa genotypes. Obviously, applying the standard Cochran-Armitage trend test^{11,12,13,14,15} to this setting will result in power loss.

In this paper, we propose optimal trend tests for genetic association studies of heterogeneous diseases.

Methods

Notation

For a marker with two alleles a and A, each individual in a case-control study is genotyped with one of three genotypes, aa, Aa and AA (indexed by i = 0, 1, 2, respectively). Assume that the case-control study consists of a total of n = r + s subjects (r cases and s controls). These n subjects can be classified into a 2 × 3 table based on each subject’s genotype and disease status as shown in Table 1.

Table 1 Genotype distribution for case-control studies.

Full size table

Let (x₀, x₁, x₂) = (0, c, 1) where the coefficient c can assume any value. Under the null hypothesis of no genetic association, the following test statistic is distributed asymptotically as a chi-square distribution with one degree of freedom:

The test with a coefficient of 0.5, Z(0.5), is the familiar Cochran-Armitage trend test^{11,12,13,14,15}.

Optimal Trend Test

Assume that the non-diseased population is in Hardy-Weinberg equilibrium with an allele frequency (for the A allele) of q. The expected genotype frequencies for the controls are then, respectively,

Further assume that the genetic effect is heterogeneous; the allele relative risk (relative risk per A allele) is not a constant value but may vary from person to person. Let the expected value of this relative risk be denoted as RR, its coefficient of variation (standard deviation divided by mean; a measure of heterogeneity), as CV_RR. The expected allele frequency for the cases is then

and its variance, calculated by a Taylor approximation (S1 Exhibit), is then

This variance is also the Hardy-Weinberg disequilibrium coefficient in the diseased population and therefore, the expected genotype frequencies for the cases are, respectively,

where δ = Var(p).

In the above calculations, we assumed Hardy-Weinberg equilibrium for the non-diseased population and a gene-dosage genetic model (a constant increase or decrease in risk per A allele). We now alleviate these assumptions. In general, the expected genotype frequencies for the controls are, respectively,

where Δ is the Hardy-Weinberg disequilibrium coefficient in the non-diseased population. The expected genotype relative risks are, respectively,

where γ is a genetic model parameter. γ = 0 corresponds to an autosomal recessive model, γ = 0.5, a gene-dosage model and γ = 1, an autosomal dominant model. As before, we allow the parameter RR to have a coefficient of variation CV_RR and the parameter p (though here it may not be interpreted as the expected allele frequency for the cases) to have a variance as prescribed in Equation (4). Under these conditions, the expected genotype frequencies for the cases (p₀, p₁ and p₂) can be derived from a Taylor expansion. The formulas are rather cumbersome and are relegated to S2 Exhibit.

With the p_i and q_i calculated for i = 0, 1 and 2, simple algebra shows that the following optimal coefficient will maximize the test statistic in Equation (1):

where

for i = 0, 1 and 2, respectively, are the expected genotype frequencies in the total case-control sample. Z(c^optimal) is our proposed optimal trend test.

An Example

We use published case-control data to demonstrate our method. Zhang et al.²³ examined the association between the adenosine diphosphate ribosyltransferase (ADPRT) gene (Val762Ala polymorphism) and lung cancer risk. The data (1000 cases and 1018 controls) are shown in Table 2.

Table 2 Association between the adenosine diphosphate ribosyltransferase (ADPRT) gene (Val762Ala polymorphism) and lung cancer risk (data taken from ref. 23).

Full size table

For simplicity, we assume Hardy-Weinberg equilibrium for the non-diseased population (with an allele frequency of q = 0.4) and a gene-dosage genetic model for the ADPRT gene (with a weak association of RR = 1.25 and a moderate heterogeneity of CV_RR = 0.4). Using [2]~[5], we then calculate q₀ = (1 − 0.4)² = 0.36, q₁ = 2 × 0.4 × (1 − 0.4) = 0.48, q₂ = 0.4² = 0.16, , δ = Var(p) = [0.45 × (1 − 0.45) × 0.4]² = 0.0098, p₀ = (1 − 0.45)² + 0.0098 = 0.31, p₁ = 2 × 0.45 (1 − 0.45) − 2 × 0.0098 = 0.48 and p₂ = 0.45² + 0.0098 = 0.22, respectively.

Using [9], we calculate the expected genotype frequencies in the total case-control sample as , and , respectively. Using [8], we calculate the optimal coefficient for this example as

Using [1], we then calculate

From this, we obtain a very small p-value of 0.00095. By comparison, the conventional Cochran-Armitage trend test for this example results in a higher p-value of 0.00164. Zhang et al.²³ used a chi-square test with two degrees of freedom, which resulted in an even higher p-value of 0.00420. Such differences in p-values should not be taken lightly, considering that a severe multiple-testing penalty often has to be made before declaring significance in a genetic association study.

Simulation Study

We perform a simulation study to examine the statistical properties of the optimal trend test. The non-diseased population is assumed to be in Hardy-Weinberg equilibrium (Δ = 0), with an allele frequency of q = 0.4. We assume a gene-dosage genetic model (γ = 0.5) and we consider situations where the A allele is a risk allele (RR = 2, 1.5 and 1.25, respectively) and a protective allele (RR = 0.5, 0.67, 0.8, respectively), in turn. For each scenario, we use a sample-size formula for the Cochran-Armitage trend test¹³ to calculate the respective sample size needed for a case-control study (assuming an equal number of cases and controls) to achieve a power of 0.8 at a significance level of 0.05.

We consider various values of CV_RR: 0.0 (no heterogeneity), 0.1, 0.2,…, 1.0 (profound heterogeneity). For each value of q, RRand CV_RR, we use Equation (8) to calculate the optimal coefficient. We then perform Monte-Carlo simulations (a total of 1,000,000 simulations for each scenario) to calculate the empirical power of the optimal trend test (at the sample sizes described above). For comparison, we also calculate the empirical power of the Cochran-Armitage trend test.

Figure 1 presents the results when the A allele is a risk allele (panels A, C and E for the coefficients; panels B, D and F for the empirical powers). When the genetic effect is homogeneous (CV_RR = 0), the optimal coefficients as calculated from Equation (8) are very close to the coefficient of the Cochran-Armitage trend test, namely, 0.5. As a result, the powers of the optimal trend test and the Cochran-Armitage trend test are very similar. As the genetic effect becomes more heterogeneous (larger CV_RR), the optimal coefficient decreases (down to below zero) and the power of the optimal trend test increases (up to ~100%). The rates of the coefficient decrease/power increase are more striking for a weaker genetic effect (RR = 1.25; panels E and F) than for a stronger genetic effect (RR = 2; panels A and B). By comparison, the Cochran-Armitage trend test uses a constant coefficient of 0.5 and its power decreases gradually with greater heterogeneity.

Figure 2 presents the results when the A allele is a protective allele. Similar findings can be seen in Fig. 1 when A is a risk allele, except that as the genetic effect becomes more heterogeneous, the optimal coefficient deviates away from 0.5 in the other direction, increasing up to beyond 1.0 rather than decreasing.

We consider different values of q, Δ andγ and the results (S3 Exhibit) all show a superiority of the optimal trend test over the conventional Cochran-Armitage trend test.

Discussion

The optimal trend test as proposed in this paper is a directed test that is most sensitive for a particular specified alternative. The optimal coefficient depends on the effect of the study gene (mean RR, variability CV_RR and genetic model γ) and on the underlying population (allele frequency q and Hardy-Weinberg disequilibrium coefficient Δ). This a priori information is to be supplied by researchers, either by a literature search or an educated guess. As shown in this study, the power gain over the conventional Cochran-Armitage trend test is striking when the genetic effects are very heterogeneous.

Sometimes, to pinpoint exactly one set of RR, CV_RR, γ, q and Δ, calculating the optimal coefficient can be difficult, but suggesting a list of possible sets of parameter values may be easier. Assuming that a researcher comes up with a total of m sets of parameter values, he/she can input these into our Equation (8) to calculate a total of m optimal coefficients, and then input these into our Equation (1) for a total of m optimal trend tests. Next, a summary test can be performed based on a weighted sum of these m test statistics:

where w₁, …, w_m are the weights given to reflect the plausibility of each set of parameter values. The multiple testing problem should not concern us here, because we make one and only one summary test. Under the null hypothesis of no genetic association, is distributed asymptotically as a mixture of chi-square variables (detailed in S4 Exhibit). (The test reduces to the optimal trend test in this paper when m = 1)

The proposed optimal trend tests (and the summary test) are easy to calculate. S5 Exhibit presents the R 3.1.2 software (R Foundation for Statistical Computing, Vienna, Austria) code and a number of worked examples. The R program also allows for the direct input of the optimal coefficients. For example, if one suspects a gene-dosage model with heterogeneous effects, one can input one coefficient slightly above 0.5, say c₁ = 0.8, another coefficient slightly below 0.5, say c₂ = 0.2 and w₁ = w₂ = 1, to the R program to test As another example, if one is uncertain about the genetic model, one can input c₁ = 0.5 (gene dosage), c₂ = 1 (autosomal dominant), c₃ = 0 (autosomal recessive) and w₁ = w₂ = w₃ = 1 into the R program to test

Additional Information

How to cite this article: Lee, W.-C. Optimal Trend Tests for Genetic Association Studies of Heterogeneous Diseases. Sci. Rep. 6, 27821; doi: 10.1038/srep27821 (2016).

References

Stessman, H. A., Bernier, R. & Eichler, E. E. A genotype-first approach to defining the subtypes of a complex disease. Cell 156, 872–877 (2014).
Article CAS Google Scholar
Garraway, L. A. & Lander, E. S. Lessons from the cancer genome. Cell 153, 17–37 (2013).
Article CAS Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article Google Scholar
Hunter, D. J. Gene-environment interactions in human diseases. Nat. Rev. Genet. 6, 287–298 (2005).
Article CAS Google Scholar
Le Marchand, L. & Wilkens, L. R. Design considerations for genomic association studies: importance of gene-environment interactions. Cancer Epidemiol. Biomarkers Prev. 17, 263–267 (2008).
Article CAS Google Scholar
Lewis, C. M. & Knight, J. Introduction to genetic association studies., Cold Spring Harb. Protoc. 2012, 297–306 (2012).
Article Google Scholar
Rava, M. et al. Selection of genes for gene-environment interaction studies: a candidate pathway-based strategy using asthma as an example. Environ. Health 12, 56 (2013).
Article Google Scholar
Thompson, J. R., Attia, J. & Minelli, C. The meta-analysis of genome-wide association studies. Brief Bioinform. 12, 259–269 (2011).
Article Google Scholar
Evangelou, E. & Ioannidis, J. P. A. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).
Article CAS Google Scholar
Pharoah, P. D. P. et al. GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer. Nat. Genet. 45, 362–370 (2013).
Article CAS Google Scholar
Cochran, W. G. Some methods for strengthening the common chi-square tests. Biometrics 10, 417–451 (1954).
Article MathSciNet Google Scholar
Armitage, P. Tests for linear trends in proportions and frequencies. Biometrics 11, 375–386 (1955).
Article Google Scholar
Slager, S. L. & Schaid, D. Case-control studies of genetic markers: power and sample size approximations for Armitage’s test for trend. Hum. Hered. 52, 149–153 (2001).
Article CAS Google Scholar
Freidlin, B., Zheng, G., Li, Z. & Gastwirth, J. L. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum. Hered. 53, 146–152 (2002).
Article CAS Google Scholar
Zheng, G. & Gastwirth, J. L. On estimation of the variance in Cochran-Armitage trend tests for genetic association using case-control studies. Stat. Med. 25, 3150–3159 (2006).
Article MathSciNet Google Scholar
Corder, E. H. & Woodbury, M. A. Genetic heterogeneity in Alzheimer’s disease: a grade of membership analysis. Genet. Epidemiol. 10, 495–499 (1993).
Article CAS Google Scholar
Perou, C. M. et al. Molecular portraits of human breast tumors. Nature 406, 747–752 (2000).
Article ADS CAS Google Scholar
Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene-expression profiling. Nature 403, 503–511 (2000).
Article ADS CAS Google Scholar
Yeoh, E. J. et al. Classification, subtype discovery and prediction of outcome in pediatric acute lymphoblastic leukemia by gene-expression profiling. Cancer Cell 1, 133–143 (2002).
Article CAS Google Scholar
Thieblemont, C. et al. Primary thyroid lymphoma is a heterogeneous disease. J. Clin. Endocrinol. Metab. 87, 105–111 (2002).
Article CAS Google Scholar
Van der Bogaert, K. et al. Otosclerosis: a genetically heterogeneous disease involving at least three different genes. Bone 30, 624–630 (2002).
Article Google Scholar
van der Pouw Kraan, T. C. et al. Rheumatoid arthritis is a heterogeneous disease: evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. Arthritis Rheum. 48, 2132–2145 (2003).
Article CAS Google Scholar
Zhang, X. et al. Polymorphisms in DNA base excision repair genes ADPRT and XRCC1 and risk of lung cancer. Cancer Res. 65, 722–726 (2005).
CAS PubMed Google Scholar

Download references

Acknowledgements

This paper is partly supported by grants from Ministry of Science and Technology, Taiwan (NSC 102-2628-B-002-036-MY3) and National Taiwan University, Taiwan (NTU-CESRP-102R7622-8). No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Research Center for Genes, Environment and Human Health and Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
Wen-Chung Lee

Authors

Wen-Chung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Competing interests

The author declares no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Lee, WC. Optimal Trend Tests for Genetic Association Studies of Heterogeneous Diseases. Sci Rep 6, 27821 (2016). https://doi.org/10.1038/srep27821

Download citation

Received: 03 November 2015
Accepted: 24 May 2016
Published: 09 June 2016
DOI: https://doi.org/10.1038/srep27821

This article is cited by

Sudden sensorineural hearing loss in patients with vestibular schwannoma
- Koichiro Wasano
- Naoki Oishi
- Kaoru Ogawa
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.