Two-stage designs to identify the effects of SNP combinations on complex diseases

Kang, Guolian; Yue, Weihua; Zhang, Jifeng; Huebner, Marianne; Zhang, Handi; Ruan, Yan; Lu, Tianlan; Ling, Yansu; Zuo, Yijun; Zhang, Dai

doi:10.1007/s10038-008-0307-x

Download PDF

Original Article
Published: 01 August 2008

Two-stage designs to identify the effects of SNP combinations on complex diseases

Guolian Kang¹^nAff3,
Weihua Yue²,
Jifeng Zhang¹,
Marianne Huebner⁴,
Handi Zhang²,
Yan Ruan²,
Tianlan Lu²,
Yansu Ling²,
Yijun Zuo⁴ &
…
Dai Zhang²

Journal of Human Genetics volume 53, pages 739–746 (2008)Cite this article

645 Accesses
10 Citations
Metrics details

Abstract

The genetic basis of complex diseases is expected to be highly heterogeneous, with many disease genes, where each gene by itself has only a small effect. Based on the nonlinear contributions of disease genes across the genome to complex diseases, we introduce the concept of single nucleotide polymorphism (SNP) synergistic blocks. A two-stage approach is applied to detect the genetic association of synergistic blocks with a disease. In the first stage, synergistic blocks associated with a complex disease are identified by clustering SNP patterns and choosing blocks within a cluster that minimize a diversity criterion. In the second stage, a logistic regression model is given for a synergistic block. Using simulated case–control data, we demonstrate that our method has reasonable power to identify gene–gene interactions. To further evaluate the performance of our method, we apply our method to 17 loci of four candidate genes for paranoid schizophrenia in a Chinese population. Five synergistic blocks are found to be associated with schizophrenia, three of which are negatively associated (odds ratio, OR < 0.3, P < 0.05), while the others are positively associated (OR > 2.0, P < 0.05). The mathematical models of these five synergistic blocks are presented. The results suggest that there may be interactive effects for schizophrenia among variants of the genes neuregulin 1 (NRG1, 8p22-p11), G72 (13q34), the regulator of G-protein signaling-4 (RGS4, 1q21-q22) and frizzled 3 (FZD3, 8p21). Using synergistic blocks, we can reduce the dimensionality in a multi-locus association analysis, and evaluate the sizes of interactive effects among multiple disease genes on complex phenotypes.

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

Article 06 June 2022

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

Article Open access 28 February 2023

Gene selection by incorporating genetic networks into case-control association studies

Article 19 December 2022

Introduction

Genetic association analyses need to address genetic and phenotypic heterogeneities for complex diseases. Investigating associations between marker genotypes and disease phenotypes for only one locus at a time without considering combinations of (unlinked) loci may capture only a small proportion of the total combined effect of all disease loci. Thus new methods are needed that allow the joint analysis of multiple loci for association analysis (Hoh and Ott 2003; Kang et al. 2008). For a complex disease, individual loci with small effects may not be sufficient to identify a genetic association with a clinical syndrome (Ritchie et al. 2001). However, the combined effect of multiple loci with minor or modest effect sizes might confer additive or multiplicative genetic contributions (Ritchie et al. 2001). Genotypic combinations at some loci that contain several alleles may be specific to certain classes of cases. Multiple loci with minor or modest effect sizes and that are located in different or the same chromosomes could form a joint functional unit for disease susceptibility. The problem is to identify these loci and quantify the interaction among these loci. Several algorithms are available to examine SNP combinations for complex diseases (Ritchie et al. 2001, 2003; Goodman et al. 2006; Onay et al. 2006; Hahn et al. 2003). These methods include dimension reduction, which combines all possible SNP–SNP interactions and chooses the set of SNPs that minimizes the classification error of cases and controls (Ritchie et al. 2003; Hahn et al. 2003). Some analyses use classification scoring functions to identify subsets of SNPs likely associated with disease risk (Goodman et al. 2006). Multivariate logistic regression and bootstrap analyses can be used to select SNP–SNP interactions via stepwise regression (Onay et al. 2006).

We use a two-stage approach that identifies multiple SNP patterns and evaluates their risk with a disease. The method allows the inclusion of any number of main effects together with the highest interaction term. At the first stage, synergistic blocks associated with a complex disease are identified. At the second stage, a logistic regression model is chosen to evaluate the interactive effects among different loci in a synergistic block. Logistic regression models for SNPs in synergistic blocks have better statistical power and more statistical significance than using a full model that includes all SNPs and their interactions, because of the reduced number of parameters.

This method is applied to a case–control schizophrenia study from a Chinese Han population to detect the effects of 17 loci of four candidate genes, the regulator of G-protein signaling-4 (RGS4, 1q21-q22), frizzled 3 (FZD3, 8p21), neuregulin 1 (NRG1, 8p22-p11), and G72 (13q34), on the susceptibility to this disease, since these have been reported in other studies to be possible candidate genes for schizophrenia (Yue et al. 2006, 2007; Zhang et al. 2004; Harrison and Owen 2003).

Harrison and Weinberger proposed that schizophrenia might be a genetic disorder of the synapses. There may be a putative common effect of schizophrenia susceptibility genes on the plasticity and functioning of synapses and other neurodevelopmental processes. We hypothesized that the NRG1, G72, RGS4 and FZD3 may play a common role in the pathogenesis of neurodevelopment and plasticity in schizophrenia.

Methods

Identifying SNP combination patterns associated with the phenotype

We assume n marker loci with two alleles each which are in Hardy–Weinberg equilibrium (HWE). The genotypes for one single nucleotide polymorphism (SNP) marker are coded by 0, 1, or 2. Denote a combination pattern of n SNPs (an SNP pattern) as an n-dimension genotype vector $ G = (a_{1} a_{2} \ldots a_{n} ), $ where n is the number of marker loci, and a _i is the genotype of the ith SNP position. Denote π ^A(G) and π ^U(G) as the frequencies of the SNP pattern G in affected cases and unaffected controls, respectively. Then, the odds ratio of SNP pattern G (OR_G) is the ratio of the odds of SNP pattern G in the cases to that of SNP pattern G in the controls, i.e.,

$$ {\text{OR}}_{G} = {{\frac{{\pi ^{A} (G)}}{{1 - \pi ^{A} (G)}}} \mathord{\left/ {\vphantom {{\frac{{\pi ^{A} (G)}}{{1 - \pi ^{A} (G)}}} {\frac{{\pi ^{U} (G)}}{{1 - \pi ^{U} (G)}}}}} \right. \kern-\nulldelimiterspace} {\frac{{\pi ^{U} (G)}}{{1 - \pi ^{U} (G)}}}}, $$

when OR_G > 1, the SNP pattern G can be positively associated with the disease; when 0 < OR_G < 1, the SNP pattern G can be negatively associated with it; when OR_G is very close to 1, the SNP pattern G cannot be associated with the disease.

The hypotheses used to test whether SNP pattern G is associated with a disease are defined as follows:

$$ H_{0} :\ln OR_{G} = 0,\quad H_{1} :\ln OR_{G} \ne 0. $$

When the null hypothesis H ₀ is rejected, then we can conclude that there is evidence for the association of G with the disease.

Let N ^A and N ^U be the numbers of cases and controls; N ^A(G) and N ^U(G) denote the number of subjects with the SNP pattern G among the cases and controls, respectively. Then, the log transform of the sample odds ratio, $ \displaystyle \ln \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G} $ is given by

$$\displaystyle \ln \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G} = \ln \left[ {{{\frac{{N^{A} (G)}}{{N^{A} - N^{A} (G)}}} \mathord{\left/ {\vphantom {{\frac{{N^{A} (G)}}{{N^{A} - N^{A} (G)}}} {\frac{{N^{U} (G)}}{{N^{U} - N^{U} (G)}}}}} \right. \kern-\nulldelimiterspace} {\frac{{N^{U} (G)}}{{N^{U} - N^{U} (G)}}}}} \right]. $$

This random variable has a large-sample approximate normal distribution with a mean of ln OR_G and a standard deviation, referred to as the asymptotic standard error (ASE) (Agresti 1996), of

$$ \displaystyle {\text{ASE}}(\ln \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G} ) = \sqrt {\frac{1}{{N^{A} (G)}} + \frac{1}{{N^{A} - N^{A} (G)}} + \frac{1}{{N^{U} (G)}} + \frac{1}{{N^{U} - N^{U} (G)}}} . $$

Therefore, the statistic

$$ Z(G) = \frac{{\ln {\displaystyle \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G}} }}{{{\text{ASE}}(\ln {\displaystyle \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G}})}} $$

can be used to test the above hypothesis. If $ \left| {Z(G)} \right| > u_{{\alpha /2}} , $ (where u _α/2 is the α/2 quantile of the standard normal distribution) then H ₀ is rejected.

When the number of candidate SNPs is very large, a genetic algorithm (GA) (Goldberg 1989) can be applied to elucidate the associated SNP patterns quickly. In a GA, we examine every SNP pattern G as a candidate solution to the problem of associated SNP patterns. The fitness of an SNP pattern is defined as

$$ {\text{fitness}}_{G} = \left| {Z(G)} \right| = \left| {\frac{{\ln {\displaystyle \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G}} }}{{{\text{ASE}}(\ln{\displaystyle \mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G} })}}} \right|. $$

The sample odds ratio $\displaystyle \left( {\mathop {{\text{OR}}}^{ \wedge }\!\!\ _{G} } \right) $ of SNP pattern G is 0 (or ∞) if N ^A(G) = 0 (or N ^U(G) = 0). The slightly amended estimator can be expressed as (Agresti 1996)

$$ {\text{fitness}}_{G}^{A} = \left| {\frac{{\ln \left[ {{{\frac{{N^{A} (G) + 0.5}}{{N^{A} - N^{A} (G) + 0.5}}} \mathord{\left/ {\vphantom {{\frac{{N^{A} (G) + 0.5}}{{N^{A} - N^{A} (G) + 0.5}}} {\frac{{N^{U} (G) + 0.5}}{{N^{U} - N^{U} (G) + 0.5}}}}} \right. \kern-\nulldelimiterspace} {\frac{{N^{U} (G) + 0.5}}{{N^{U} - N^{U} (G) + 0.5}}}}} \right]}}{{\sqrt {\frac{1}{{N^{A} (G) + 0.5}} + \frac{1}{{N^{A} - N^{A} (G) + 0.5}} + \frac{1}{{N^{U} (G) + 0.5}} + \frac{1}{{N^{U} - N^{U} (G) + 0.5}}} }}} \right|. $$

The parameters in the genetic algorithm are population sizes, crossover probabilities and mutation probabilities (Goldberg 1989). By assigning different values to the parameters, we can get different SNP patterns that are significantly associated with the disease.

Patterns associated with clustering

Cluster analysis is used to group similar SNP patterns associated with the disease. An SNP synergistic block is the loci in the SNP pattern cluster.

The clustering of SNP patterns is based on a similarity measure (Duda and Schafer 2001). A matched vector of the associated SNP pattern G may be denoted by $ Q^{G} = [q_{1} ,q_{2} , \ldots ,q_{N} ]^{\prime}, $ where q _i = 1 if sample i has an associated SNP pattern G; otherwise, q _i = 0, 1 ≤ i ≤ N, N = N ^A + N ^U. The similarity distance d(G ₁, G ₂) between the associated SNP patterns G ₁ and G ₂ can be defined as

$$ d(G_{1} ,G_{2} ) = 1 - \left[ {\frac{{(Q^{{G_{1} }} )^{T} Q^{{G_{2} }} }}{{\left\| {Q^{{G_{1} }} } \right\|_{2}^{{1/2}} \left\| {Q^{{G_{2} }} } \right\|_{2}^{{1/2}} }}} \right]^{2} , $$

(where $ \left\| a \right\|_{2} $ represents the 2-norm of vector a), which is in fact the cosine measure between points. This distance measure is used to investigate the clustering of SNP patterns.

Grouping loci/SNPs into synergistic blocks

Based on the previous step, we cluster the SNP patterns into several SNP pattern clusters that include similar associated SNP patterns. Let R denote one SNP pattern cluster and N _R denote the number of associated SNP patterns in R. The set of loci considered in R is referred to as γ(R). Let $ G^{i} = \left( {a_{1}^{i} a_{2}^{i} \ldots a_{n}^{i} } \right) $ denote the ith associated SNP pattern in R, where 1 ≤ i ≤ N _R. Then, the difference between the ith and the jth SNP pattern can be defined as $ G^{i} - G^{j} = \left( {a_{1}^{i} - a_{1}^{j} a_{2}^{i} - a_{2}^{j} \cdots a_{n}^{i} - a_{n}^{j} } \right), $ where, for each locus $ k \in \{ 1,2, \ldots ,n\} , $ we get $ a_{k}^{i} - a_{k}^{j} = \left\{ \begin{gathered} 1,\; a_{k}^{i} \ne a_{k}^{j} ; \hfill \\ 0,\; a_{k}^{i} = a_{k}^{j} . \hfill \\ \end{gathered} \right. $ The diversity of loci considered between the ith and jth SNP patterns is expressed as

$$ D_{{ij}} (W) = \left( {G^{i} - G^{j} } \right)W\left( {G^{i} - G^{j} } \right)^{\prime } , $$

where W = (w _ij) is a weight matrix that can take any of several forms, depending on the intended use of the ancillary information. The sum of D _ij(W) can be used to describe the diversity of the loci considered in a SNP pattern cluster. In particular, the diversity of a block B can be defined as

$$ D_{B} = \frac{1}{{2N_{R} }}\sum\limits_{{i = 1}}^{{N_{R} }} {\sum\limits_{{j = 1}}^{{N_{R} }} {D_{{ij}} (W_{B} ) = \frac{1}{{2N_{R} }}\sum\limits_{{i = 1}}^{{N_{R} }} {\sum\limits_{{j = 1}}^{{N_{R} }} {\left( {G^{i} - G^{j} } \right)W_{B} \left( {G^{i} - G^{j} } \right)^{\prime } } } ,} } $$

where W _B = (w _ij) with $ w_{{ij}} = \left\{ \begin{gathered} 1,\; i = j{\text{ and }}i \in \gamma (B); \hfill \\ 0,\; {\text{otherwise,}} \hfill \\ \end{gathered} \right. $ where γ(B) is the set of loci considered in block B.

Similarly, the diversity of all loci in γ(R) is defined as

$$ D_{R} = \frac{1}{{2N_{R} }}\sum\limits_{{i = 1}}^{{N_{R} }} {\sum\limits_{{j = 1}}^{{N_{R} }} {D_{{ij}} (W_{R} ) = \frac{1}{{2N_{R} }}\sum\limits_{{i = 1}}^{{N_{R} }} {\sum\limits_{{j = 1}}^{{N_{R} }} {\left( {G^{i} - G^{j} } \right)W_{R} \left( {G^{i} - G^{j} } \right)^{\prime } } } ,} } $$

where W _R = (w _ij) with $ w_{{ij}} = \left\{ \begin{gathered} 1,\quad i = j{\text{ and }}i \in \gamma (R); \hfill \\ 0,\quad {\text{otherwise}} .\hfill \\ \end{gathered} \right. $

Therefore, we choose one subset B of γ(R) as a synergistic block of this SNP pattern cluster. There are three conditions for choosing B:

(1)
B ∈ R _g.
(2)
Only choose significant SNP patterns whose fitness satisfy Fitness(G _B) > u _α/2.
(3)
Choose the value of B that minimizes $ \frac{{D_{B} }}{{D_{R} }} + \frac{1}{{Adapt(G_{B} )}} $, where R _g represents all possible subsets of γ(R), and G _B is one of the SNP patterns corresponding to block B. The G _B that satisfies (2) and (3) is then an associated SNP pattern for this synergistic block B.

Suppose there are v loci in a synergistic block, i.e., v covariates. The multivariate logistic regression model (Agresti 1996) that includes all main effects and the highest order interaction term is

$$ \log {\text{it}}(p) = \beta _{0} + \beta _{1} {\text{SNP}}_{1} + \cdots + \beta _{v} {\text{SNP}}_{v} + \beta _{{v + 1}} {\text{SNP}}_{1} \times {\text{SNP}}_{2} \times \cdot \cdot \cdot \times {\text{SNP}}_{v} , $$

where SNP_i ∈ {0,1}, 1 ≤ i ≤ v, and SNP_i = 1 when the genotype of the ith locus is the same as that of the same locus in the synergistic block and 0 otherwise. Furthermore, $ p = \Pr ({\text{Affected}}|({\text{SNP}}_{1} , \ldots ,{\text{SNP}}_{v} ) \in \{ 0,1\} ^{v} ). $ In this model, we only consider the main effects and the interactions of all loci in a synergistic block.

Results

Application of the synergistic block algorithm to simulated data

Two data sets containing 100 replicates of 200 cases and 200 controls for ten unlinked biallelic loci were simulated using two two-locus interaction models as examples. The first and second loci out of ten were chosen as the disease loci with interaction effects. This number of replicates was selected to provide method validation and to enable exhaustive computational searches of all possible fourth-order SNP combinations to be performed. Hardy–Weinberg equilibrium was assumed. For the two-locus interaction disease models, the interaction effect was simulated using penetrance functions via two models. Model 1: P(Disease|AAbb) = 0.02, P(Disease|AaBb) = 0.2, P(Disease|aaBB) = 0.02, and P(Disease|others) = 0; Model 2: P(Disease|AAbb) = 0.2, P(Disease|AaBb) = 0.2, P(Disease|aaBB) = 0.2, and P(Disease|others) = 0, where A, a, B and b represent the alleles for the disease loci (Frankel and Schork 1996), with a population allele frequency of 0.3 in all cases. For the other SNPs, their population allele frequencies are drawn from the uniform distribution [0.1, 0.9]. Here, we chose the SNP combinations from among all of the fourth-order ones with fitnesses >3 as clusters when searching for a synergistic block.

The results are presented in Table 1. It includes the percentage of the time that the synergistic block associated with the disease was identified within 100 replicates, the average frequency in cases and controls, the average fitness with its standard deviation, and the average odds ratio for the SNP combination corresponding to the synergistic block. From these results, we know that for this interaction disease model, the synergistic block method has reasonable power to identify high-order gene–gene interactions.

Table 1 Simulation results for 200 cases and 200 controls in two disease models*

Full size table

We also evaluated the type I error rate by simulating 100 data sets under the assumption that there is no interaction effect between unlinked loci on the disease. If there is also no main genetic effect on the disease, then we should not find any SNP synergistic block; if there is one locus with a main genetic effect on the disease, we will find this locus 70 times out of 100, and there is no multi-locus synergistic block (data not shown).

Application of the synergistic block algorithm to schizophrenia data

Four candidate genes, RGS4, FZD3, NRG1, G72, and seventeen SNPs that were genotyped and analyzed in this study are listed in Table 2. The SNPs SNP8NRG221533 and SNP8NRG243177 (NRG1) and rs2323019 and rs352203 (FZD3) are in linkage disequilibrium (Table 3). It is generally speculated that these four genes functionally converge to act upon schizophrenia by influencing synaptic plasticity and cortical microcircuitry (Harrison and Weinberger 2005). Seventeen SNPs were genotyped across these four candidate genes in the Chinese Han population, which included 120 schizophrenia cases and 225 healthy controls. Prior reports have suggested that variations in the genes may be associated with increased risk of paranoid schizophrenia (Yue et al. 2006, 2007; Zhang et al. 2004; Harrison and Owen 2003).

Table 2 Genes and SNP ID numbers

Full size table

Table 3 The linkage disequilibrium information for four genes

Full size table

Using a genetic algorithm, we identified 652 significantly associated SNP patterns using 120 cases and 225 controls. The P values for the permutation tests were at most 0.01 after adjusting for multiple testing using Benjamini–Hochberg’s algorithm (Benjamini and Hochberg 1995).

These SNP patterns were then clustered, producing five (two positively and three negatively) associated SNP pattern clusters. Five synergistic blocks (Table 4) were identified. Logistic regression models were used to quantify the size of the effect. Synergistic block 1, including the polymorphisms NRG1 (rs3735774), NRG1 (rs 2919390), and RGS4 (rs12753561), is associated with schizophrenia (OR 6.74, P < 0.001). The interaction between these three SNPs is statistically significant (P = 0.0014). Synergistic block 2, including the polymorphisms RGS4 (rs12753561), FZD3 (rs2241802) and FZD3 (rs2323019), is associated with schizophrenia (OR 2.0948, P < 0.001). The interaction between these three SNPs is statistically significant (p = 0.0221). Synergistic block 3, including the polymorphisms NRG1 (SNP8NRG221533) NRG1 (rs3735774) and NRG1 (rs6988339), is associated with schizophrenia (OR 0.2014, P < 0.001). The interaction between these three SNPs is statistically significant (P = 0.0017). Similar results were obtained for synergistic blocks 4 and 5.

Table 4 The synergistic blocks extracted from the 17 SNPs of NRG1, G72, RGS4 and FZD3

Full size table

Information on the models is shown in Table 5. These results indicate that the interactions of alleles at different loci located on different or the same chromosomes may significantly influence complex human diseases.

Table 5 SNP interaction effects for logistic regression models

Full size table

Discussion

In the present study, we identify synergistic blocks as being a genetic factor in complex disease, and use a two-stage approach to detect the effects of synergistic blocks on paranoid schizophrenia. Cluster analysis and logistic regression models are used to identify genetic associations of synergistic blocks with the phenotype. The approach is applied to detect the individual and interactive effects of four candidate genes, NRG1, G72, RGS4, and FZD3, on paranoid schizophrenia. The results suggest associations between these four genes and schizophrenia and intergenic interaction effects among these four genes on schizophrenia. However, because of the potentially data-driven nature of these conclusions and the limited multi-locus interaction model used in the simulation part, additional studies are required to confirm the validity of the present method in future studies.

Screening individual and interactive effects of disease genes in complex diseases is a feasible approach using the synergistic block-detecting method. These results further support previous findings about the interactive effects among NRG1, G72, RGS4 and FZD3, especially via glutamatergic transmission mediated by the ErbB3 and N-methyl-D-aspartate (NMDA) receptors (Harrison and Weinberger 2005), the Wnt pathway, or other processes associated with neurodevelopment and plasticity.

In this study, we found that the target SNPs interacted, and we introduced the concept of synergistic blocks. Since there are several disease loci in every synergistic block, we can address the dimensionality problem in multi-locus association analysis, and evaluate the sizes of the interactive effects between loci and their contributions to the disease. For genes that may play a role via similar neuropathological mechanisms, such as the effect of NRG1, G72, RGS4 and FZD3 on schizophrenia via synaptic function or other neurodevelopmental processes, synergistic blocks can be used to identify groups of loci that are specific to the disease and to quantify the interaction between genes or loci.

The synergistic block method described in this paper considered only ten SNPs in the simulation part and 17 SNPs in the real data set. Further numerical investigations will be needed when the number of SNPs to be examined is large, for examples 100 SNPs or 1000 SNPs.

Since genome-wide association studies (Risch and Merikangas 1996; Kang and Zuo 2007) are a priority, there is also the potential for synergistic blocks to be useful on a larger scale. However, we cannot use the present method to investigate genome-wide SNP data directly because of the large number of SNP combinations. One possible strategy is to break up large analyses into roughly independent modules of hundreds of tests (or SNPs) each (Seaman and Müller-Myhsok 2005). If we then detect a synergistic block for each group of SNPs, this synergistic block can be examined by our logistic model. As long as the correlation between the modules of SNPs is reasonably low, little power will be sacrificed by approximating in this way because the synergistic block has accounted for the correlation within the blocks.

Electronic database information

deCODE genetics, http://www.decode.com/nrg1/markers for SNPs and microsatellite markers in NRG1.

GenBank, http://www.ncbi.nlm.nih.gov/SNP/ for NRG1, G72, RGS4, FZD3.

References

Agresti A (1996) An introduction to categorical data analysis. Wiley, New York
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289–300
Google Scholar
Duda RO, Schafer DW (2001) Pattern classification. Wiley, New York
Frankel WN, Schork NJ (1996) Who’s afraid of epistasis? Nat Genet 14:371–373
Article CAS Google Scholar
Goldberg DE (eds) (1989) Genetic algorithms in search, optimization, and machine learning, vol 77. Addison-Wesley, New York
Goodman JE, Mechanic LE, Luke BT, Ambs S, Chanock S, Harris CC (2006) Exploring SNP–SNP interactions and colon cancer risk using polymorphism interaction analysis. Int J Cancer 118:1790–1797
Article CAS Google Scholar
Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19:376–382
Article CAS Google Scholar
Harrison PJ, Owen MJ (2003) Genes for schizophrenia? recent findings and their pathophysiological implications. Lancet 361:317–319
Google Scholar
Harrison PJ, Weinberger DR (2005) Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence. Mol Psychiatry 10:40–68
Article CAS Google Scholar
Hoh J, Ott J (2003) Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4:701–709
Article CAS Google Scholar
Kang GL, Yue WH, Zhang JF, Cui YH, Zuo YJ, Zhang D (2008) An entropy-based approach for testing genetic epistasis underlying complex diseases. J Theor Biol 250:362–374
Article CAS Google Scholar
Kang GL, Zuo YJ (2007) Entropy-based joint analysis for two-stage genome-wide association studies. J Hum Genet 52:747–756
Article Google Scholar
Onay VU, Briollais L, Knight JA, Shi E, Wang Y, Wells S et al (2006) SNP–SNP interactions in breast cancer susceptibility. BMC Cancer 6:114
Article Google Scholar
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Article CAS Google Scholar
Ritchie MD, Hahn LW, Moore JH (2003) Power of multifactor dimensionality reduction for detecting gene–gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24:150–157
Article Google Scholar
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147
Article CAS Google Scholar
Seaman SR, Müller-Myhsok B (2005) Rapid simulation of p-values for product methods and multiple-testing adjustment in association studies. Am J Hum Genet 76:399–408
Article CAS Google Scholar
Yue WH, Kang GL, Zhang YB, Qu M, Tang FL, Han YH, Ruan Y, Lu TL, Zhang JF, Zhang D (2007) Association of DAOA polymorphisms with schizophrenia and clinical symptoms or therapeutic effects. Neurosci Lett 416:96–100
Article CAS Google Scholar
Yue WH, Liu ZH, Kang GL, Yan J, Tang FL, Ruan Y, Zhang JF, Zhang D (2006) Association of G72/G30 polymorphisms with early-onset and male schizophrenia. Neuroreport 17:1899–1902
Article CAS Google Scholar
Zhang YB, Yu X, Yuan YB, Ling YS, Ruan Y, Si TM, Lu TL, Wu SP, Gong XH, Zhu ZJ, Yang JZ, Wang F, Zhang D (2004) Positive association of the human Frizzled 3 (FZD3) gene haplotype with schizophrenia in Chinese Han population. Am J Med Genet 129B:16–19
Article Google Scholar

Download references

Acknowledgments

We would like to thank the anonymous referees for very helpful comments on the early draft. This work was supported in part by grants from the National Natural Science Foundation of China (No. 30530290, 30400149, 60334040), the National High Technology Research and Development Program of China (No. 2006AA02Z195, 2007AA02Z423), The National Basic Research Program of China (No. 2007CB512301), The National Science Foundation of America (No. DMS 0234078), and the Strategic Partnership Grant of the Michigan Foundation.

Author information

Guolian Kang
Present address: Department of Biostatistics, The University of Alabama at Birmingham, Birmingham, AL, 35294, USA

Authors and Affiliations

Key Laboratory of Systems and Control, Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100080, China
Guolian Kang & Jifeng Zhang
Key Laboratory for Mental Health, Ministry of Health, Institute of Mental Health, Peking University, 51 Hua Yuan Bei Road, Beijing, 100083, China
Weihua Yue, Handi Zhang, Yan Ruan, Tianlan Lu, Yansu Ling & Dai Zhang
Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA
Marianne Huebner & Yijun Zuo

Authors

Guolian Kang
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Yue
View author publications
You can also search for this author in PubMed Google Scholar
Jifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Marianne Huebner
View author publications
You can also search for this author in PubMed Google Scholar
Handi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Tianlan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yansu Ling
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Dai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dai Zhang.

Additional information

Guolian Kang and Weihua Yue contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, G., Yue, W., Zhang, J. et al. Two-stage designs to identify the effects of SNP combinations on complex diseases. J Hum Genet 53, 739–746 (2008). https://doi.org/10.1007/s10038-008-0307-x

Download citation

Received: 31 October 2007
Accepted: 21 May 2008
Published: 01 August 2008
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10038-008-0307-x

Keywords

This article is cited by

An Information Theoretical Study of the Epistasis Between the CNR1 1359 G/A Polymorphism and the Taq1A and Taq1B DRD2 Polymorphisms: Assessing the Susceptibility to Cannabis Addiction in a Turkish Population
- Aysun Baransel Isir
- Cesur Baransel
- Muradiye Nacak
Journal of Molecular Neuroscience (2016)

Two-stage designs to identify the effects of SNP combinations on complex diseases

Abstract

Similar content being viewed by others

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

Gene selection by incorporating genetic networks into case-control association studies

Introduction

Methods

Identifying SNP combination patterns associated with the phenotype

Patterns associated with clustering

Grouping loci/SNPs into synergistic blocks

Results

Application of the synergistic block algorithm to simulated data

Application of the synergistic block algorithm to schizophrenia data

Discussion

Electronic database information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

This article is cited by

An Information Theoretical Study of the Epistasis Between the CNR1 1359 G/A Polymorphism and the Taq1A and Taq1B DRD2 Polymorphisms: Assessing the Susceptibility to Cannabis Addiction in a Turkish Population

Search

Quick links

Abstract

Similar content being viewed by others

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

Gene selection by incorporating genetic networks into case-control association studies

Introduction

Methods

Identifying SNP combination patterns associated with the phenotype

Patterns associated with clustering

Grouping loci/SNPs into synergistic blocks

Results

Application of the synergistic block algorithm to simulated data

Application of the synergistic block algorithm to schizophrenia data

Discussion

Electronic database information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

An Information Theoretical Study of the Epistasis Between the CNR1 1359 G/A Polymorphism and the Taq1A and Taq1B DRD2 Polymorphisms: Assessing the Susceptibility to Cannabis Addiction in a Turkish Population

Search

Quick links