Abstract
Detection of epistatic interaction between loci has been postulated to provide a more indepth understanding of the complex biological and biochemical pathways underlying human diseases. Studying the interaction between two loci is the natural progression following traditional and wellestablished single locus analysis. However, the added costs and time duration required for the computation involved have thus far deterred researchers from pursuing a genomewide analysis of epistasis. In this paper, we propose a method allowing such analysis to be conducted very rapidly. The method, dubbed EPIBLASTER, is applicable to case–control studies and consists of a twostep process in which the difference in Pearson's correlation coefficients is computed between controls and cases across all possible SNP pairs as an indication of significant interaction warranting further analysis. For the subset of interactions deemed potentially significant, a secondstage analysis is performed using the likelihood ratio test from the logistic regression to obtain the Pvalue for the estimated coefficients of the individual effects and the interaction term. The algorithm is implemented using the parallel computational capability of commercially available graphical processing units to greatly reduce the computation time involved. In the current setup and example data sets (211 cases, 222 controls, 299468 SNPs; and 601 cases, 825 controls, 291095 SNPs), this coefficient evaluation stage can be completed in roughly 1 day. Our method allows for exhaustive and rapid detection of significant SNP pair interactions without imposing significant marginal effects of the single loci involved in the pair.
Similar content being viewed by others
Introduction
Understanding the effects of genes on phenotypes and diseases has long been suggested to embed a complex form of interaction as a result of interinhibitory and excitatory effects, with any attempt to explain these effects simply as additive effects of the individual genes being an overly simplistic model that ultimately provides an incorrect view of the genetic influence on the phenotype.
The study of interactions between polymorphic loci can stem from both a biological and statistical genetics perspective. The first approach establishes a model based on a priori knowledge of how the genes function and interact. The latter, being a ‘biological blind’ approach, helps to draw inferences from previously unknown interdependencies between genes. The ultimate objective, similar to all blackbox studies, is to merge the conclusions drawn from both approaches; however, as the observations made cannot be measured at a level more finite than the eventual system output, the former approach is more likely to be refined by first having a solid statistical finding as its basis.
As our effort primarily focuses on drawing statistical inference on epistatic actions/interactions between genes, a new method is proposed to help improve our capability to search and sift out significant interactions. This paper will discuss the performance of our method in its current implementation. The results applied to a simulated subset of SNPs and to two real genomewide data sets recorded from panic disorder and multiple sclerosis studies will be presented, followed by a discussion of some properties of the approach.
Materials and methods
Overview of the twostage search strategy
The strategy consists of a twostage approach. First, a filtering stage using the difference of Pearson's correlation coefficients that performs an exhaustive twolocus interaction multiplicative effects^{1} search across all possible pairwise SNP combinations is performed. This is followed by logistic regression analysis on those subset of pairs deemed significant in the previous stage.
Data representation
Each SNP is represented as integer values ranging from 0 to 2 based on the count of a chosen reference nucleotide of the selected SNP for an allele dosage model, or as 0 or 1 depending on the genotype for a dominance or recessivity coding. In the current study, the allele dosage model is applied. An overall matrix is generated to store the information of all SNPs as column vectors and the recorded values for individual subjects along the rows. Column vectors are then analyzed in pairs and the correlation coefficients are tabulated for cases and controls separately. Correlation coefficients are calculated from a 3 × 3 ordered genotype matrix, the genotypes being encoded 0, 1, 2. The difference between the correlation coefficients in cases and controls is then computed and used as an indication of the SNP pair contributing significantly to the classification between cases and controls (equation (1)).
Correlation coefficients (Pearson's) difference between caseonly and controlonly for each SNP–SNP pair. Note that no assumptions, such as HWE to hold, are needed here.
Difference of correlation coefficients=Δ
The variance of each of these correlation coefficients is, as shown by Wellek and Ziegler,^{2} equal to 1/(n−1), where n is the respective number of cases and controls. As the cases and controls obviously constitute, independent samples, the total variance V_{tot} is then the sum of the two single variances. As a consequence, and from both Gretton et al^{3} and Wellek and Ziegler,^{2} we can conclude that T=ΔV_{tot}^{1/2} ∼ N(0, 1).
The first stage of analyzing the difference of correlations approach searches for significant interaction terms. The second stage then computes the fit using a full rank logistic model (equation (2)), including the intercept and additive marginal effects, on the subset of loci pairing deemed significant from the first stage, from which a statistical test can be conducted to test for the coefficient of interaction term being significantly different from zero.
Full rank logistic regression model:
Hardware and software setup
The hardware used in the experimental setup consists of two pairs of commercially available NVIDIA GTX295 GPUs (Santa Clara, CA, USA) running on an Intel Core i7 920 with 2.67 GHz (Santa Clara, CA, USA) central processing unit host (CPU) using 12 GB of DDR3 RAM (Corsair Inc., Fremont, CA, USA). The software program is implemented in R (version 2.9.2; R Development Core Team^{4}) with the ‘gputools’ package beta version 0.14 (Buckner et al^{5}) installed (http://cran.rproject.org/web/packages/gputools), in which the function ‘gpuCor’ permits correlation coefficients to be tabulated for all possible pairwise interactions across the column vectors using the Compute Unified Device Architecture (CUDA)enabled NVIDIA graphic cards. The graphical card uses its parallel computational capability to process independent evaluations faster than conventional CPUbased computation. As the correlation coefficients between each SNP pair can be tabulated independently, this can take full advantage of the inherent parallel computation performed on graphical cards. The overall time performance depends on the sample size and desired marker coverage. A total evaluation of (number of SNPs choose 2) interactions is typically accomplished within 24 h for the entire data set (2000 individuals consisting of 1000 cases, 1000 controls with 500 000 SNPs) with the available GPU resources and the given results retention criteria. Limitations on speed can originate from local main memory storage, memory transfer speed and number of onboard GPU cores present. Some data partitioning to take advantage of all current GPU resource are thus required to render this method most efficient. The data set for the study is first partitioned into blocks containing 2000 SNPs each, which can be handled by the memory on the graphic card. Hence, for a genomewide data set of 500K SNPs, 250 partitions are required.
The process goes through the entire data set and calculates the correlation coefficients in blocks of 2000 SNPs. The very first correlation analysis performed is on the first partition to itself, a ‘partitionbased autocorrelation’, resulting in 1 999 000 unique correlations. The process then increases the partition index of the second partition by one and completes a correlation between two distinct sets of 2000 SNPs, a ‘partitionbased crosscorrelation’, to yield 4 million unique results. This process of increasing the nested loop index is repeated until it reaches the last partition set, at which point the toplevel loop index gets increased by one. The process can be summarized in the following steps:

1)
Partition the data set into a size of 2000 SNPs. Note that this number may increase or decrease depending on the number of individuals studied.

2)
Set up a twolevel nested loop to apply the partitionbased correlation for all possible SNP pairs for cases and controls separately.

3)
Compute the difference of correlation coefficients between cases and controls after each partitionbased autocorrelation or crosscorrelation is complete.

4)
Compute the Pvalues of each difference given that the distribution of the differences follows a Gaussian distribution (refer to the Results section).

5)
Retain only SNP pairs that show a Pvalue below a selected threshold.

6)
Repeat steps 3–5 across all partition pairs.

7)
Proceed to stage 2 by performing a logistic regression on the selected pairs.
Results
Simulated data
A simulated data set is generated consisting of 2000 SNPs and a subject size of 5000 controls and 5000 cases. This simulated data set is created without any specific model allowing for any a priori knowledge of which particular pair will be significant. The purpose is to demonstrate validity in the approximation of the resulting logistic regression interaction term Pvalue to the approximation based on the difference in correlation coefficients. The distribution of the differences of correlation coefficients is noted to exhibit a Gaussian distribution within each partition set, referring to the histogram plot in Figure 1. This observation has been examined in greater detail by Gretton et al,^{3} stating that when samples are indeed drawn from two different distributions the distribution of the discrepancy of the chosen function, difference of estimated mean correlation coefficients in this study, will converge to a Gaussian distribution. An additional proof for the difference of correlation coefficients to exhibit a Gaussian distribution can be found in Wellek and Ziegler,^{2} who have also shown that the variance of any single difference under the null hypothesis and thus also of the distribution of the sum of all differences is the sum of the reciprocals of the number of cases and controls. For this Gaussianism, equal numbers of cases and controls are not needed.
In practice, to test for the significance of each pair, a Zscore is tabulated for each difference within the partition set. This Zscore is computed on the basis of the mean and standard deviation of all the differences noted within the partition set, which is a close approximation to the overall mean and standard deviation, given that the partition size is chosen to be large enough, typically resulting in a few million pairs for each partition set. Those interactions exhibiting a high overall Zscore are then taken as an indication that the effect of the interaction term of the two SNPs in question is deemed valuable enough to be passed on to the second stage. This filtered subset is then subjected to a second level of mathematicalintensive evaluation using the likelihood ratio test on the logistic regression model.
Referring to Figure 2, the Pvalues of the interaction product term in a general linear fit are plotted against their correlation coefficient differences between cases and controls. To help delineate any logarithmic trend, the Pvalues are shown as negative logarithmic values. As shown in Figure 2, there is a strong relationship between the two variables, of a parabolic function in the region centered around the origin to a linear relationship in the region of higher values. The region that is of most interest to the study is the higher numerical value region, as the Pvalues are the smallest and the differences are the largest. As the differences closely follow a Gaussian distribution (Figure 1), a Zscore threshold can be used to estimate the retention rate. The statistic is then estimated using the fact that the Zscore would follow a standard Tdistribution with a sufficiently large number of degrees of freedom. A plot comparing the Pvalues obtained between the approximation and the validation step is illustrated in Figure 3 and demonstrates a high R^{2} value of 99.9%.
To help address the issues of limited physical disk space and of retaining only those interactions that show strong significance, a Zscore of 4.5 was chosen as the cutoff threshold, which corresponds to a probability of 6.8 × 10^{−6} retention rate. Thus, for the partitionbased autocorrelation generating ∼2 million (2000 choose 2) correlation coefficient differences, only the top 14 interaction pairs are expected to be retained. Overall, we expect the top ∼8.5 × 10^{6} pairs out of a possible ∼1.25 × 10^{11} retained from the first stage in a marker coverage of 500K SNPs.
Real data
Real genetic data have been recruited from two separate published studies. The first data set originated from a panic disorder study^{6} with a total of 299 468 SNPs, where 211 cases and 222 controls have been retained after standard quality control measures. Computing the difference of correlation coefficients across all pairs and choosing a Pvalue threshold of 1.0 × 10^{−5} resulted in a retention of 373 153 SNP pairs. Similarly, a second larger data set from a multiple sclerosis^{7} study with a total of 291 095 SNPs in 601 cases and 825 controls is also being investigated. Using the same Pvalue threshold of 1.0 × 10^{−5}, the 407 660 most significant SNP pairs are retained upon subjecting it to the first stage.
In view of verifying that indeed no significant pairs have been left out in the adopted differenceofcorrelationcoefficients stage of our method, a comparison to the Pvalues of the interaction term in a normal linear regression of all possible SNP pairs must be made. To perform this bruteforce approach in a time efficient manner, we have used a newly released software tool, FastEpistasis (http://www.vitalit.ch/software/FastEpistasis),^{8} which is an extension of the PLINK epistasis module capable of distributing the work in parallel on multiple CPU cores. It is important to point out that this method is not working on the difference of odds ratio as conducted by the Plink option bearing the same name. The program is meant to be executed on quantitative phenotypes, but the difference in Pvalues, which are the relevant measure for this comparison, has been noted to be negligible on several sample SNP pairs (see also Table 1, comparing the FastEpistasis column with the logistic regression interaction term Pvalue column, and also simulation studies (Supplementary Figure 1)). The Pvalues computed from FastEpistasis is regarded to be the ‘true’ value used for comparison with the approximated method described in stage 1 of EPIBLASTER.
The results from SNP pairs with Pvalues below 1 × 10^{−6} tested against null from FastEpistasis are matched with the results obtained from the first stage of EPIBLASTER. From the panic disorder analysis, FastEpistasis produced 37336 SNP pairs, of which 36056 are also found in the EPIBLASTER stage 1 retained subset (96.5%). The unmatched pairs are indeed examples in which EPIBLASTER stage 1 underestimates the Pvalues and the hard threshold prevents it from being included. Thus, these unmatched pairs are all in fact situated around the Pvalues threshold region and are of lesser significance compared with the others. The plot of the matching pairs is shown in Figure 4, and for ease of visualization, it is illustrated as a smoothed color density of the actual scattered points plot. The top 10 most significant pairs from the FastEpistasis approach are listed with greater details in Table 1, along with their annotations in Table 2, and are marked with a dark circle in Figure 4. For EPIBLASTER stage 1 to capture all top 10 pairs of the ‘true’ approach (FastEpistasis), a Pvalue threshold of 1.26 × 10^{−8} must be applied, thus resulting in the top 387 pairs of EPIBLASTER stage 1 to be passed onto stage 2. In other words, EPIBLASTER would have produced an additional 377 pairs to be tested in view of capturing the very top 10 true results. In Figure 5, the top 100 SNP pairs of the panic disorder study are marked, which would have resulted in applying a retention threshold for EPIBLASTER stage 1 of 1.67 × 10^{−7} passing on ∼5194 pairs to stage 2 (listed in greater detail in Supplementary Table 1).
From the multiple sclerosis analysis, FastEpistasis yielded 42 731 pairs to have an interaction term with a Pvalue below 1 × 10^{−6}, of which 42 524 pairs (99.5%) are also retained from EPIBLASTER stage 1. The matching pairs, along with the respective Pvalues tabulated using the FastEpistasis method versus the approximated EPIBLASTER stage 1 method, are plotted in Figure 6. The top 10 pairs are marked in Figure 6 and listed in Table 3, along with the SNP annotations in Table 4. For EPIBLASTER to capture the top 10 pairs, it would have required 48 of its top significant SNP pairs to be carried over to stage 2, where the Pvalues from logistic regression are tabulated. In addition, to capture the top 100 pairs, refer to Figure 7 (listed in greater detail in Supplementary Table 2). EPIBLASTER would have required the top 19 242 pairs obtained from stage 1 to be passed on to stage 2.
Discussion
Although the search is conducted across all possible pairwise SNP interactions, the main interest is to delineate interactions between unlinked loci that influence the illness. In the first stage, the difference of Pearson's correlation coefficients, tabulated from the SNP pair, is taken between controls and cases across all possible interactions. In addition, this step can also incorporate replicating for significant association across two or more independent studies using a number of subjects’ weighted metaanalysis during the actual run. In the current experimental setup with a genomewide analysis of epistasis study, this first stage involving the difference of correlation coefficient evaluations can be completed within roughly 24 h on commercially available GPU setups compared with roughly a year on a singlecore CPU. From the subset of interactions deemed significant in the rapid filtering stage, a secondstage analysis is performed using the likelihood ratio statistical test on the logistic regression to obtain the Pvalue on the estimated coefficients corresponding to the intercept, individual effects of the single loci and the interaction terms. As this necessitates only a minor amount of computations of logistic regressions in R using the ‘Anova’ test on the ‘glm’ fit with the ‘binary’ family option, for a retention rate of 6.8 × 10^{−6}, an expected 8.5 × 10^{6} pairs, this requires ∼2.5 days on a single core system of the hardware specifications listed in the methods section in R. This is impractical, however, if we are to limit ourselves to a range of top significant pairs that can be below a more stringent threshold, for example, 1.0 × 10^{−8}, it drops down to an expected number of ∼600–700 pairs, which require around 150 s (four computations per second) to validate. It should be noted that dedicated software, such as INTERSNP (http://intersnp.meb.unibonn.de),^{9} is considerably faster for this second pass than pure R. The quoted figure of 8.5 × 10^{6} interaction pairs should be achieved between 1 or 2 hours using INTERSNP. A complete genomewide association analysis with INTERSNP on a single core would be in the order of a year. FASTEPISTASIS would have taken ∼70 days on a single core. Note that INTERSNP is quoted here for a full logistic regression, whereas FASTEPISTASIS has a linear regression. Of course the performance of both INTERSNP (which again is about two orders of magnitude faster than plain R (using the glm() function)) and FASTEPISTASIS can be easily improved using multicore systems and clusters.
Of course, including more SNPs into the second stage is feasible. We have found a threshold of 6.8 × 10^{−6} practical. Lowering this by, for example, one order of magnitude will incur only a slight increase in runtime for stage 1 and a linear increase for stage 2. Of course, if the threshold for entry into stage 2 is lowered too much, hardware specifics such as disk speed become an issue in the performance of the program.
The reasoning behind the twostage approach is threefold. First, the computations involved in the first stage are much less extensive as compared with estimating for significance in logistic regression. Second, a readily available R package, ‘gputools’, allows the estimation of correlation coefficients to be performed on the graphic card, which greatly reduces the time and cost. Third, contrary to common multistage practice, in which the single locus test is performed initially, followed by higher order testing on loci that showed single locus significance, the necessity of interaction loci to first show significant marginal effects is not imposed, thus rendering this method a truly exhaustive search across all twoway interactions. The results from the MS and panic disorder analyses are used as the preliminary basis in cases in which this statement can be founded. A Plink method to test for univariate SNP significance is used to provide an indication of the SNPs that would be kept using the more traditional mandatory main effect significance. First, referring to Supplementary Tables S.2 and S.3 in the supplementary section, it is shown that a vast majority of significant interaction pairs would not have been captured if one is to prefilter based on univariate significance. Furthermore, referring to Supplementary Figures S.2–S.5 in the supplementary section, univariate Pvalues are plotted against the interaction pairs captured by EPIBLASTER. The lack of trends helps to support the fact that the method indeed conducts the search unbiased to the marginal effects at the two loci. High overestimation of the significance of the pair in the preliminary step 1 filtering stage can occur when the SNPs are very rare. Severe underestimation of Pvalues using this approximation (false negatives) has also rarely been noticed but was traced to a small subset of those SNP pairs that are in high linkage disequilibrium, which are not the main focus of this method. For computational ease, no lower bound on physical distance between SNPs or on LD between SNPs is imposed.
We also noted no inflation of the test statistic in our data sets; however, in certain cases it might be advisable to include MDS or PCA components in the analysis; for example, by working on residuals of the SNP genotypes on these components.
Overall, a comparison of the Pvalues obtained from FastEpistasis to the approximated Pvalues tabulated from EPIBLASTER stage 1 shows that, although discrepancy in Pvalues does exist, the adopted method does manage to capture all of the significant pairs, and the occurrence of significant pairs being omitted is practically nil when the threshold Pvalues are chosen to be far enough from the Bonferronicorrected global significance. Nevertheless, the computational load for the secondstage analysis is negligible.
The concept of adopting the analysis of the difference of caseonly and controlonly studies into a unified test has been suggested in previous studies analyzing pairwise SNPs. Hoh and Ott^{10} initially proposed taking the ratios of the Chisquares of the 3 × 3 contingency tables between cases and controls as a measure of significance. Zhao et al^{11} and Zaykin et al^{12} have also proposed examining the gene interactions with a defined linkage disequilibrium created by the interaction between two unliked loci. Significance is evaluated with the analysis of the difference of the LD values between caseonly and controlonly populations. Hardy–Weinberg equilibrium must hold for this measure of interaction and test statistics to be valid. Zhao et al has further suggested that the method exhibits greater power than conventional linear regression, as it does not treat the interaction as a residual term and allows for implicit nonlinear interaction and faster computational time than the traditional four degrees of freedom logistic regression model, rendering it more suitable for GWAS. The proposed method in this paper conducts the search in the first stage for only the effects of the interaction term by analyzing the difference of the correlation coefficients as an indication for significance, and then adopts the more conventional logistic regression method to substantiate the findings on a subset of pairs. As the difference is based on two separate groups, population stratification can have an effect on the power of the method. However, considering the number of pairs retained from our examples, the actual inflation is very low. In the multiple sclerosis analysis, 423 680 pairs are expected to be below the 1 × 10^{−5} threshold; an observed number of pairs captured is noted as 407 660. The method can indeed be simplified to a caseonly study, by making the assumption that the correlation coefficient of the controls be null for all pairs. This approach would further speed up the computational time by a factor of 2 at the expense of potentially losing both power and precision. Moreover, the approximation approach does not only apply to the dosage coding (0, 1, 2), and also to other coding such as dominance, recessivity and heterozygosity. In general, a Pvalue cutoff of less than 1 × 10^{−5} should indeed be sufficient to capture all the results with a P<1 × 10^{−8} in the logistic regression and is, with all caution, suggested as a cutoff to be used in a first analysis, truly making EPIBLASTER exhaustive within this setting.
With respect to the results from MS and panic disorder that are presented, we note that, although there is no pair beyond a Bonferronicorrected threshold for significance at a corrected Pvalue of 0.05, the marginal effects in the top 10 pairs do not at all show a tendency to deviate from a uniform distribution. This means that prefiltering pairs of SNPs on marginal Pvalues for subsequent epistasis analysis may be a less promising strategy than sometimes considered, although more analyses and larger sample sizes will be needed for a better founded statement on this issue.
In the editing phase of this article, it has come to our attention that Hu et al^{13} have also developed a strategy involving GPUs to enhance genomewide significant SNP pair interaction search, quoting a total runtime of 27 h to scan through the Wellcome Trust Case Control Consortium's bipolar disorder data consisting of 500K SNPs. The proposed algorithm by Hu et al helps consolidate the improved time performance using the inherent parallel nature of GPU to search for significance in all possible SNP pairs. This method is distinct from ours as it uses the a difference of odds ratios measure between cases and controls to pick significant SNP pair candidates.
We would like to point out that with EPIBLASTER it is possible to perform genomewide analysis of epistasis on very smallscale and inexpensive hardware, reducing the need for large clusters for this kind of application.
Future work is planned to incorporate the logistic regression and other more novel definitions of gene–gene interactions onto the graphical processing units. EPIBLASTER is available at http://www.mpipsykl.mpg.de/epiblaster.
References
Marchini J, Donelly P, Cardon LR : Genomewide strategies for detecting multiple loci that influence complex diseases. Nat Genet 2005; 37: 413–417.
Wellek S, Ziegler A : A genotypebased approach to assessing the association between single nucleotide polymorphisms. Hum Hered 2009; 67: 128–139.
Gretton A, Borgwardt K, Rasch B, Schölkopf B, Smola A : A kernel method for the twosampleproblem. NIPS 2006; 513–520.
R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2009. ISBN 3900051070, http://www.Rproject.org.
Buckner J, Wilson J, Seligman M, Athey B, Watson S, Meng F : The gputools package enables GPU computing in R. Bioinformatics 2010; 26: 134–135.
Erhardt A, Czibere L, Roeske D et al: TMEM132D, a new candidate for anxiety phenotypes: evidence from human and mouse studies. Mol Psychiatry 2010, epub ahead of print 6 April 2010; doi:10.1038/mp.2010.41.
Nischwitz S, Cepok S, Kroner A et al: Evidence for VAV2 and ZNF433 as susceptibility genes for multiple sclerosis. J Neuroimmunol 2010; 227: 162–166.
Schüpbach T, Xenarios I, Bergmann S, Kapur K : FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 2010; 26: 1468–1469.
Herold C, Steffens M, Brockschmidt F, Baur MP, Becker T : INTERSNP: genomewide interaction analysis guided by a priori information. Bioinformatics 2009; 25: 3275–3281.
Hoh J, Ott J : Mathematical multilocus approaches to localizing complex human trait genes. Nat Rev Genet 2003; 4: 701–709.
Zhao J, Xiong M : Test for interaction between two unlinked loci. Am J Hum Genet 2006; 79: 831–845.
Zaykin DV, Meng Z, Ehm MG : Contrasting linkagedisequilibrium patterns between cases and controls as a novel associationmapping method. Am J Hum Genet 2006; 78: 737–746.
Hu X, Liu Q, Zhang Z et al: SHEsisEpi, a GPUenhanced genomewide SNPSNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder. Cell Res 2010; 20: 854–857.
Acknowledgements
This work was funded in part by the Max Planck Society. Support through the German ministry for Education and Research (BMBF) through the NGFN (Moods—01GS08145 to BMM) and the project ControlMS within the German Competence Network Multiple Sclerosis (KKNMS) is gratefully acknowledged. KT is supported by MEXT Kakenhi 21680025 and the FIRST program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on the European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
KamThong, T., Czamara, D., Tsuda, K. et al. EPIBLASTERfast exhaustive twolocus epistasis detection strategy using graphical processing units. Eur J Hum Genet 19, 465–471 (2011). https://doi.org/10.1038/ejhg.2010.196
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ejhg.2010.196
Keywords
This article is cited by

Missing Causality and Heritability of Autoimmune Hepatitis
Digestive Diseases and Sciences (2023)

Genetic control of nongenetic inheritance in mammals: stateoftheart and perspectives
Mammalian Genome (2020)

Deep mixed model for marginal epistasis detection and population stratification correction in genomewide association studies
BMC Bioinformatics (2019)

Confounding of linkage disequilibrium patterns in large scale DNA based genegene interaction studies
BioData Mining (2019)

How to increase our belief in discovered statistical interactions via largescale association studies?
Human Genetics (2019)