Fitting Proportional Odds Model to Case-Control data with Incorporating Hardy-Weinberg Equilibrium

Zhang, Wei; Zhang, Zehui; Li, Xinmin; Li, Qizhai

doi:10.1038/srep17286

Download PDF

Article
Open access
Published: 26 November 2015

Fitting Proportional Odds Model to Case-Control data with Incorporating Hardy-Weinberg Equilibrium

Wei Zhang¹^na1,
Zehui Zhang²^na1,
Xinmin Li³^na1 &
…
Qizhai Li¹^na1

Scientific Reports volume 5, Article number: 17286 (2015) Cite this article

1514 Accesses
8 Citations
Metrics details

Subjects

This article has been updated

Abstract

Genetic association studies have been proved to be an efficient tool to reveal the aetiology of many human complex diseases and traits. When the phenotype is binary, the logistic regression model is commonly employed to evaluate the association strength of the genetic variants predispose to human diseases because the maximum likelihood estimator of the odds ratio based on case-control data is equivalent to that from the same model by taking the data as being arisen prospectively. This equivalence does not hold for the proportional odds model and using it to analyze the case-control data directly often results in a substantial bias. Through putting a parameter of the minor allele frequency in the modified likelihood function under the condition that the Hardy-Weinberg equilibrium law holds within controls, a consistent estimator is obtained. On the basis of it, we construct a score test statistic to test whether the genetic variant is associated with the diseases. Simulation studies show that the proposed estimator has smaller mean squared error than the existing methods when the genetic effect size is away from zero and the proposed test statistic has a good control of type I error rate and is more powerful than the existing procedures. Application to 45 single nucleotide polymorphisms located in the region of TRAF1-C5 genes for the association with four-level anticyclic citrullinated protein antibody from Genetic Analysis Workshop 16 further demonstrates its performance.

Causal machine learning for predicting treatment outcomes

Article 19 April 2024

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Genome-wide association studies

Article 26 August 2021

Introduction

A retrospective study is highly popular in genetic epidemiology study because of its economic cost and substantially reduced study duration compared with a prospective design. The data in a retrospective design are not drawn from the general population and they are randomly sampled from each subpopulation and the numbers of subjects chosen from each individual subpopulation are usually matched. In the last decade, the retrospective case-control genetic association studies, especially genome-wide association studies, have been considered as a big success in searching for the deleterious genetic susceptibilities^1,2,3. By now, more than ten thousand single nucleotide polymorphisms (SNPs) have been identified to be associated with human complex diseases (http://www.genome.gov/gwasstudies). There are two types of phenotypes: continuous and discrete. The majority of the discrete phenotypes are binary and ordinal. The logistic regression model is a major tool to analyze the binary phenotypes because the odds ratio estimator from the logistic regression model based on case-control data is equivalent to that from the same model by taking the data as being sampled from a prospective study^4,5,6. Although there is a lack of identification of the intercept, it does not matter because the intercept is not concerned in practice. Compared with that using two statuses (case and control) to define the medical outcomes, an ordinal description with three or three more values might be more accurate to measure the quality of life for some human complex diseases. For example, there are three levels for depicting the degree of severity of carcinoid heart disease (CHD): without CHD, mild CHD and severe CHD⁷ and four levels for those of live steatosis: normal liver, light steatosis, moderate steatosis and severe steatosis⁸.

Several procedures were proposed to analyze the retrospective data with ordinal responses in the literatures. An ad hoc approach is to use the proportional odds model⁹ by taking the retrospective data as being enrolled prospectively. However, it is not appropriate because the proportional odds model does not belong to the multiplicative intercept risk model^10,11 and the resulting maximum likelihood estimator (MLE) of the interested parameter is not consistent to its true value except for the scenario that the true value of the concerned parameter is 0. So, under a discrete choice probability model, Cosslett¹⁰ proposed to maximize a modified likelihood function to get the MLE; Wild¹¹ considered fitting the proportional odds model to case-control data from a finite population with known population totals in each response category and obtained the MLE. Based on the final optimization function, it revealed that Wild’s MLE is identical to that of Cosslett.

The Hardy-Weinberg equilibrium (HWE) law is a very important principal in population genetics. It is a routine to check whether the observed genotypes satisfy the HWE law in control population before conducting an association test, because deviations from HWE can indicate many problems such as population stratification, genotyping error and so on^12,13,14. In a genome-wide association study, the threshold of p-value is 10⁻⁴ for the HWE test to ensure that there is no possible systematic genotyping error in the sampled individuals. On the other hand, checking whether the HWE law holds in case population has been used as an association test for fine-mapping of the disease loci^15,16. In a further way, the HWE law has also been advocated in many associated studies. For example, Wang and Shete¹⁷ derived a powerful test by incorporating the derivations of HWE in cases for single-marker analysis; Zheng and NG¹⁸ proposed a powerful two-phase analysis by using the HWE test to classify the genetic models; Chen et al.¹⁹ considered testing the gene-environment interaction by assuming that the HWE holds in the controls. Consider a biallelic SNP locus with two alleles A and a. Denote the allele frequency of A by p. Under the HWE principal, the genotype frequencies of AA, Aa and aa are p², 2p(1 − p) and (1 − p)², respectively.

To the best of our knowledge, most of the exiting methods in the literatures focused on the estimation of the parameters. Although the test statistic such as the score test or the Wald test derived from the proportional odds model is still valid and has been used in practice, such as the CHD study⁷ and the liver study⁸, we will show that it might lose power under the alternative, especially when the genetic effect size is large. In this work, by incorporating HWE principal in control population, we obtain a new estimator, which optimizes the newly modified likelihood function. Using this, we derive the score test statistic, which is shown to be more powerful than the exiting methods through extensive computer simulations. Finally, we apply it to 45 SNPs in the region of TRAF1-C5 for the association with four-level anticyclic citrullinated protein antibody from Genetic Analysis Workshop 16 and find that there are three SNPs significantly associated with anticyclic citrullinated protein antibody measure at the genome-wide significance level of 10⁻⁷.

Results

Simulation Settings

We compare the performances of three estimators: proMLE (the MLE derived from the likelihood function by taking the data as being arisen prospectively), modMLE (the MLE derived from the modified likelihood function) and hweMLE (the proposed method). What needs illustration is that the parameters used in this section are defined in the following “Notation” section. Since in the real application analyzed later, J = 4, we consider J = 4 in the simulations with Pr(Y = 1) = 0.98, Pr(Y = 2) = 0.01, Pr(Y = 3) = 0.006 and Pr(Y = 4) = 0.004, which results in θ₁ = 3.89, θ₂ = 4.59 and θ₃ = 5.51 under β = 0. We choose β ∈ {ln1.2, ln1.4, ln1.6, ln1.8} and the minor allele frequency (MAF) p ∈ {0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}. Let n₁ ∈ {1, 200, 500, 300, 200} and n₂, n₃ and n₄ be drawn from a multinomial distribution Mul(n₁, q), where n₁ = n₂ + n₃ + n₄ and q = (0.5, 0.3, 0.2)^τ is the probability vector which is proportional to the corresponding prevalence rates of the case statuses with (P(Y = 2), P(Y = 3), P(Y = 4))^τ. The reason why we choose several values of n₁ is to make the power comparable for different genetic effect sizes. We consider two significance levels: 0.05 and 0.001. 1,000 and 50,000 replicates are conducted to calculate the empirical type I error rates and powers for the significance levels of 0.05 and 0.001, respectively.

Point Estimate

Figures 1, 2, 3, 4 show the boxplots of the above three estimators corresponding to β = ln 1.2, β = ln 1.4, β = ln 1.6 and β = ln 1.8, respectively, where we set the same value of n₁ (= 500). 1,000 replicates are conducted. As expected, the proMLE is biased and the proposed hweMLE are unbiased. Interestingly, the proMLE underestimates β in most cases with the median values being smaller than the true values, while the modMLE overestimates β a little bit with the median values being greater than the true values. The absolute value of bias of the proMLE increases as β increases. For example, when MAF = 0.25, the bias of the proMLE for β = ln 1.2, ln 1.4, ln1.6 and ln 1.8 are −0.023, −0.051, −0.068 and −0.091, respectively. From these boxplots, when β is away from zero, the proposed hweMLE performs the best, followed by modMLE, then proMLE based on the bias of the median value. For instance, when β = ln1.2 (= 0.182) and MAF = 0.10, the median values of proMLE, modMLE and hweMLE are 0.163, 0.202 and 0.192, respectively, while for β = ln 1.4 (= 0.336) and the same MAF, the median values of proMLE, modMLE and hweMLE are 0.276, 0.340 and 0.338, respectively. Table S1–S4 in the supplemental material summarize the results of the empirical bias and square root of mean squared error (srMSE). These tables indicate that the empirical bias of hweMLE is the smallest among the three estimators and the srMSE of hweMLE is the smallest under most of the considered scenarios, especially when β is relatively large.

Type I error rate

As is shown in the “Methods” section, the observed Fisher information matrix of the modMLE is close to singular since there is an equality among Δ_js, j = 2, 3,…, J based on the reciprocal of case-control design. So we compare two test statistics. One is the score test derived from the proportional odds model by taking the data as being arisen prospectively. For convenience, we denote it by proT; Another is the proposed hweT. Table 1 shows the empirical type I error rates for the MAF ranging from 0.1 to 0.5 and the nominal significance levels of 0.05 and 0.001. We set n = 1,000. 1,000 and 50,000 replicates are conducted to calculate the empirical type I error rates. The results indicate that both proT and hweT can control the type I error rates correctly with the empirical values being close to the nominal level. For example, when the MAF is 0.20 and the nominal level is 0.05, the empirical type I error rates of proT and hweT are 0.052 and 0.044, respectively and when the MAF is 0.15 and the nominal level is 0.001, those of proT and hweT are, respectively, 0.00086 and 0.00078.

Table 1 The empirical type I errors of proT and hweT.

Full size table

Power Comparison

In this part, we explore the power performances of proT and hweT. For the convenience, we assume . In order to make the power comparable, we set the small sample size for large β. In details, we set n = 1,000, 500, 300 and 200 for β = ln 1.2, ln 1.4, ln 1.6 and ln 1.8, respectively, under the nominal significance level of 0.05 and n = 2,400, 1,000, 600 and 400 for β = ln 1.2, ln 1.4, ln 1.6 and ln 1.8, respectively, under the nominal significance level of 0.001. We conduct 1,000 and 50,000 replicates for the significance level of 0.05 and 0.001. Figures 5 and 6 show the power results. Both figures indicate that the proposed hweT is more powerful than the proT. In some cases, there is 6% power increase. For example, when n = 1,000, MAF = 0.35 and β = ln 1.4, the power of hweT is 0.582, which is much larger than that 0.522 of proT under the significance level of 0.001.

Application to Four-level Anticyclic Citrullinated Protein Antibody data

The region of TRAF1-C5 in human genome has been shown to be associated with rheumatoid arthritis (RA) based on both genome-wide association study^20,21 and candidate gene approach²². The anticyclic citrullinated protein (anti-CCP) antibodies have been frequently found in the blood of the individuals with RA²³. It is reasonable to assume that there are associations between the anti-CCP measure and the SNPs in the region of TRAF1-C5. To test this hypothesis, we apply the hweT and the proposed hweMLE procedures to the data from the Genetic Analysis Workshop 16 (GAW16)²⁴. This data consists of 2,062 subjects. Based on the anti-CCP measure, the subjects can be divided into four categories: without RA, below 20; low or weak, 20–39; moderate, 40–59; high or strong, > 60. The number of subjects are 1,195, 103, 66 and 698 corresponding to the above four categories, respectively. There are 45 SNPs in the region of TRAF1-C5 on Chromosome 9. The snpids (SNP ID), positions, the point estimators and the p-values of the existing and the proposed procedures are summarized in Table 2. Before conducting the association analysis, we use the HWD coefficient²⁵, denoted as D, to test whether the HWE law holds in the controls. When the HWE law holds in the controls, D = 0. The HWE test is given by , where , n_c0, n_c1 and n_c2 are the counts of the subjects possessing the genotype 0, 1 and 2, respectively and n_c = n_c0 + n_c1 + n_c2. Under that D = 0, the HWE test follows the standard normal distribution. The results in Table 2 show that the HWE law holds in the controls for these 45 SNPs under the significance level of 0.05. Then we apply the proposed hweT and proT to test for the associations between these 45 SNPs and the anti-CCP measure. We find that the significance of the association between these SNPs and the anti-CCP measure using the proposed hweT is always stronger than those using the proT. For example, we can identify three SNPs, rs1953126, rs881375 and rs3761847 with p-values less than 10⁻⁷ using the proT or the hweT. However, we can identify another five SNPs including rs10760130, rs10985073, rs2900180, rs7037673 and rs1468673 with p-values being less than 0.0001 using the hweT, while only three SNPs rs1953126, rs881375 and rs2900180 can be identified using the proT. In addition, we use the Fisher-combined method to combine the p-values over these 45 SNPs as . The combined values of T_com for the proT and hweT are 408.9 and 512.2. Based on 1,000 bootstrap replicates, we calculate the p-values of T_com for the proT and hweT. Both are less than 0.001. This indicates that the gene TRAF1-C5 is associated with the anti-CCP measure and that the hweT can detect association signals easily than the proT.

Table 2 The point estimates of β and p-values for 45 SNPs in region of TRAF1- C5 for the association with 4-level anti-CCP measure.

Full size table

Discussion

When using the logistic regression to handle the binary response outcome in genetic association studies, it has been shown that the odds ratio estimate based on the MLE is equivalent to that from the same model by taking the data as being arisen prospectively^4,5,6. However, this equivalence does not hold for the proportional odds model. Cosslett¹⁰ and Wild¹¹ proposed to obtain a consistent estimator through optimizing a modified likelihood function. In this work, by incorporating HWE principal in the retrospective likelihood function, we extend Cosslett’s procedure and obtain a consistent and asymptotically unbiased estimator. Based on this estimator, we construct the score test statistic. Numerical results show that the MLE from the prospective proportional odds model is substantially biased and the proposed estimator is consistent and the proposed score test statistic is powerful than that constructed from a prospective likelihood function.

HWE principal is very important in genetic association studies. It is often considered to be a cornerstone for further statistical inference. Departure from HWE often result from inbreeding, population migration and genotyping errors. Researchers have suggested that the deviation of HWE among cases can provide additional evidence for the associations between genetic variants and human diseases^17,19. As shown in the results, incorporating HWE into the proportional odds model can also improve the efficiency of the estimate of genetic effect and also improve the power to identify the deleterious genetic variants. We also explore the performance of the proposed procedure when the HWE is violated. The simulation results are available in the supplementary material, which indicate that the proposed procedures work well when the HWE is violated slightly. Actually, if the HWE law is violated, we can estimate the parameters of interest through assuming that the genotype frequencies in the control group satisfy Pr(G = 0|Y = 1) = p₀, Pr(G = 1|Y = 1) = p₁, Pr(G = 2|Y = 1) = p₂. Thus there is one additional parameter that needs to be estimated in the proposed modified likelihood function. At this point, the number of parameters is larger than that under the assumption of HWE. Hence, the biases of the estimators tend to bigger than those under the assumption that the HWE law holds.

The sandwich variance estimate is a common tool used to estimate the variance of quasi-likelihood estimates from generalized estimating equations (GEE)²⁶. However, Kauermann and Carroll²⁷ proved that the sandwich variance estimator has the downward bias with O(n⁻¹) order for the quasi-likelihood estimates from GEE, where n is the total sample size, because it is derived based on the first-order approximation of the Taylor expansion about the estimating equation. Thus in our case, if we use the sandwich variance estimator to construct the test statistic, this may result in inflated type I error rate. Hence, we use the summation of the first derivatives of the likelihood function on the individual observation to estimate the variance of the MLE. It should be noted that the used variance estimate is a consistent estimate based on the law of large numbers.

Methods

Notations

Consider a biallelic SNP and the genotype at a marker locus is coded as 0, 1 or 2, with the value corresponding to the copy number of a certain candidate allele. Let Y be a J ordered status response variable and G be a random variable taking the genotype values of the subjects at a SNP locus. Without loss of generality, let Y = 1 denote the status of a healthy individual and Y = j denote the status of a diseased subject, j = 2, 3,…, J. Then the standard proportional odds model⁹ is

where β is the parameter of interest, which is called log-odds ratio when J = 2 and θ_j, j = 1, 2,…, J − 1 are the intercepts. Denote ϕ(x) = 1/(1 + exp(−x)) for x ∈ . Using (1), we have

Let be the genotypes of the n_j subjects who are randomly sampled from the jth subpopulation for j = 1, 2,…, J. Denote as the total sample size.

Consistent Estimate

If we take the data as being collected from a prospective study, the prospective likelihood function is

where θ = (θ₁, θ₂,…,θ_J−1)^τ and τ denotes the transpose of a vector or a matrix. The corresponding log-likelihood function is

As shown in Cosslett¹⁰, using the above model to analyze the retrospective data directly often leads to a biased estimate of β when β ≠ 0 and the bias increases as β increases. So, Cosslett¹⁰ proposed to optimize the following modified log-likelihood function to get the estimate of β:

where Δ = (Δ₂, Δ₃,…, Δ_J)^τ and Δ₁ = n₁/n.

Based on the reciprocal of case-control design where all case groups are randomly sampled from the case population, the structure among different case groups in the sample is the same as that in the general case population. So each case group should have the same degree of importance which yields Δ₂ = Δ₃ = … = Δ_J. Taking this equality into consideration, the score test statistic cannot be constructed using l_m(β, θ, Δ) because the observed Fisher information matrix of (β, θ^τ, Δ^τ)^τ is close to be singular. So, in the following part, we will derive a MLE through incorporating this equality and the HWE law. Suppose that the HWE principle holds in the control population with the minor allele frequency p. Thus Pr(G = 0|Y = 1) = (1 − p)², Pr(G = 1|Y = 1) = 2p(1 − p), Pr(G = 2|Y = 1) = p². From the Supplemental Material, we set , , where , w = [(1 − p)²/ϕ(θ₁) + 2p(1 − p)/ϕ(θ₁ − β) + p²/ϕ(θ₁ − 2β)]. Then, the modified likelihood function is rewritten as

and the log-likelihood function is

where m₀, m₁ and m₂ are the numbers of the subjects with genotypes 0, 1 and 2, respectively, in the sample. We adopt two steps to estimate the parameters. We first estimate the parameter p using the observations in controls and denote the estimator by . Based on the law of large numbers, we know that converges to p almost surely. Then we optimize l_h(β, θ, p) according to β and θ under to obtain the estimate of β and θ through. Denote the estimator of (β, θ^τ)^τ by . Then from the theorem in the Supplemental Material, is consistent to the true value of β and asymptotically follows a standard normal distribution, where , is the (1, 1)^th element of the matrix I⁻¹(β, θ),

and l_h,ij = ln(Δ_j) + ln[ϕ(θ_j − g_ijβ) − ϕ(θ_j−1 − g_ijβ)] − ln{Δ₁ϕ(θ₁ − g_ijβ) + Δ₂[1 − ϕ(θ₁ − g_ijβ)]} for i = 1, 2,…, n_j and j = 1, 2,…, J.

Test Statistic

In genetic association studies, the most concern of investigators is whether the genetic variant is associated with the disease. One can construct the Wald test statistic based on the asymptotic normality of . Another commonly employed test statistic is the score test statistic. Denote A_β = n₁A_β1/A_β2, where , and the MLE of θ under β = 0 by . Then, the score function is

and the score test statistic (denote it by hweT) is

where is defined as above. Under the null hypothesis, hweT asymptotically follows the standard normal distribution.

Additional Information

How to cite this article: Zhang, W. et al. Fitting Proportional Odds Model to Case-Control data with Incorporating Hardy-Weinberg Equilibrium. Sci. Rep. 5, 17286; doi: 10.1038/srep17286 (2015).

Change history

21 December 2015
The version of this Article previously published omitted Wei Zhang and Zehui Zhang as equally contributing authors. This has now been corrected in both the PDF and HTML version of the paper.

References

Wellcome Trust Case Control Consortium (WTCCC). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 447, 661–678 (2007).
Yue, W. H. et al. Genome-wide association study identifies a susceptibility locus for schizophrenia in Han Chinese at 11p11.2. Nature Genet. 43, 1228–1231 (2011).
Article CAS Google Scholar
Levine, D. M. et al. A genome-wide association study identifies new susceptibility loci for esophageal adenocarcinoma and Barrett’s esophagus. Nature Genet. 45, 1487–1493 (2013).
Article CAS Google Scholar
Prentice, R. L. & Pyke, R. Logistic disease incidence models and case-control studies. Biometrika. 66, 403–411 (1979).
Article MathSciNet Google Scholar
Farewell, V. T. Some results on the estimation of logistic models based on retrospective data. Biometrika. 66, 403–411 (1979).
Article MathSciNet Google Scholar
Weinberg, C. R. & Wacholder, S. Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika. 80, 461–465 (1993).
MathSciNet MATH Google Scholar
Korse, C. M., Taal, B. G., De Groot, C. A., Bakker, R. H. & Bonfrer, J. M. Chromogranin-A and N-terminal pro-brain natriuretic peptide: an excellent pair of biomarkers for diagnostics in patients with neuroendocrine tumor. J. Clin. Oncol. 27, 4293–4299 (2009).
Article CAS Google Scholar
Bedogni, G., Kahn, H. S., Bellentani, S. & Tiribelli, C. A simple index of lipid overaccumulation is a good marker of liver steatosis. BMC Neurosci. 10, 98 (2010).
Google Scholar
McCullagh, P. Regression models for ordinal data (with discussion). J. R. Stat. Soc. Ser. B-Stat. Methodol. 42, 109–142 (1980).
MATH Google Scholar
Cosslett, S. Maximum likelihood estimators for choice-based samples. Econometrica. 49, 1289–1316 (1981).
Article MathSciNet Google Scholar
Wild, C. J. Fitting prospective regression models to case-control data. Biometrika. 78, 705–717 (1991).
Article MathSciNet Google Scholar
Wigginton, J. E., Cutler, D. J. & Abecasis, G. R. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 76, 887–893 (2005).
Article CAS Google Scholar
Hosking, L. et al. Detection of genotyping errors by HW equilibrium testing. Eur. J. Hum. Genet. 12, 395–399 (2004).
Article CAS Google Scholar
Schaid, D. J., Batzler, A. J., Jenkins, G. D. & Hilderbrandt, M. A. Exact tests of Hardy-Weinberg equilibrium and homogeneity of disequilibrium across strata. Am. J. Hum. Genet. 79, 1071–1080 (2006).
Article CAS Google Scholar
Nielsen, D., Ehm, M. G. & Weir, B. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am. J. Hum. Genet. 63, 1531–1540 (1999).
Article Google Scholar
Leal, S. M. Detection of genotyping error of pseudo-SNPs via deviations from Hardy-Weinberg equilibrium. Genet. Epidemiol. 29, 204–214 (2003).
Article Google Scholar
Wang, J. & Shete, S. A test for genetic association that incorporates information about deviation from Hardy-Weinberg proportions in cases. Am. J. Hum. Genet. 83, 53–63 (2008).
Article CAS Google Scholar
Zheng, G. & NG, H. K. Genetic model selection in two-phase analysis for case-control association studies. Biostatistics. 9, 391–399 (2008).
Article Google Scholar
Chen, J., Kang, G., Vanderweele, T., Zhang, C. & Mukherjee, B. Efficient designs of gene-environment interaction studies: implications of Hardy-Weinberg equilibrium and gene-environment independence. Stat. Med. 31, 2516–2530 (2012).
Article MathSciNet Google Scholar
Plenge, R. M. et al. TRAF1-C5 as a risk locus for rheumatoid arthritis-a genomewide study. N. Engl. J. Med. 357, 1199–1209 (2007).
Article CAS Google Scholar
Liang, X. et al. Identifying rheumatoid arthritis susceptibility genes using high-dimensional methods. BMC Proc. 3, S79 (2009).
Article Google Scholar
Kurreeman, F. A. et al. A candidate gene approach identifies the TRAF1/C5 region as a risk factor for rheumatoid arthritis. PLos Med. 4, e278 (2007).
Article Google Scholar
Huizinga, T. W. et al. Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins. Arthritis Rheumatol. 52, 3433–3438 (2005).
Article CAS Google Scholar
Amos, C. I. et al. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc. 3, S2 (2009).
Article Google Scholar
Weir, B. S. In Genetic data analysis II: Methods for Disctrete Population Genetic Data, Ch. 3, 91–139 (Sinauer Associates Inc, 1996).
Google Scholar
Liang, K. Y. & Zeger, S. L. Longitudinal Data Analysis U sing Generalized Linear Models. Biometrika. 73, 13–22 (1986).
Article MathSciNet Google Scholar
Kauermann, G. & Carroll, R. J. A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 96, 1387–1396 (2001).
Article MathSciNet Google Scholar

Download references

Acknowledgements

Q. Li was supported in part by the National Science Foundation of China, Grant No. 11371353, 61134013 and the Breakthrough Project of Strategic Priority Program of the Chinese Academy of Sciences, Grant No. XDB13040600. X. Li was supported paritally by the Shandong Provincial Natural Science Foundation of China, Grant No. ZR2014AM019.

Author information

Zhang Wei and Zhang Zehui contributed equally to this work.

Authors and Affiliations

Key Laboratory of Systems Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Wei Zhang & Qizhai Li
Central China Normal University, Wuhan, China
Zehui Zhang
Qingdao University, Qingdao, China
Xinmin Li

Authors

Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zehui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinmin Li
View author publications
You can also search for this author in PubMed Google Scholar
Qizhai Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.Z. and Z.Z. contributed to the design of the study and performed the analysis; X.L. prepared all the figures; Q.L. conceived the idea and drafted the manuscript; all authors participated in the data interpretation, read and approved the final manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Zhang, W., Zhang, Z., Li, X. et al. Fitting Proportional Odds Model to Case-Control data with Incorporating Hardy-Weinberg Equilibrium. Sci Rep 5, 17286 (2015). https://doi.org/10.1038/srep17286

Download citation

Received: 14 July 2015
Accepted: 28 October 2015
Published: 26 November 2015
DOI: https://doi.org/10.1038/srep17286

This article is cited by

Two-phase SSU and SKAT in genetic association studies
- Yuan Xue
- Juan Ding
- Dongdong Pan
Journal of Genetics (2020)
GATE: an efficient procedure in study of pleiotropic genetic associations
- Wei Zhang
- Liu Yang
- Qizhai Li
BMC Genomics (2017)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.