Introduction

A better knowledge of epistasis would greatly contribute to our understanding of complex trait variation and the evolutionary dynamics of biological systems (Carlborg and Haley, 2004; de Visser and Elena, 2007; Phillips, 2008). With the advances of molecular genetics and genomics, efforts have been made to develop methods and tools for detecting statistical epistasis in different population structures (Malmberg and Mauricio, 2005; Xu and Jia, 2007; Zhang and Liu, 2007; Aylor and Zeng, 2008; Pattin et al., 2008). Research has explored a wide range of topics including statistical modelling (Alvarez-Castro and Carlborg, 2007; Moore et al., 2007; Sung and Wijsman, 2007), model parameterization (Zeng et al., 2005; Wang and Zeng, 2006; Alvarez-Castro et al., 2008), search algorithm (Carlborg et al., 2005; Ritchie and Motsinger, 2005; Kooperberg and Leblanc, 2008; Mechanic et al., 2008), multiple testing (Jannink and Jansen, 2001; Sen and Churchill, 2001; Storey et al., 2005; Stich et al., 2007) and computing efficiency (Ljungberg et al., 2004; Bush et al., 2006). A number of relevant software tools have also been made publicly available (Broman et al., 2003; Hahn et al., 2003; Yandell et al., 2007; Yang et al., 2007; Le Rouzic and Alvarez-Castro, 2008).

There has been, however, a long-standing controversy concerning the importance of non-additive effects including epistasis. Much of the genetic variance within populations seems to be additive, although epistasis can also contribute to the additive variance (Hill et al., 2008). Further, despite many reports of statistical epistasis, there has only been limited success in identifying its functional basis. This has led to scepticism about the function epistasis has in populations, particularly given the difficulty in interpreting and replicating the biological consequences of interactions at just the pair-wise level (Phillips, 2008). In such a situation, one may ask how many of our findings are likely to be false positives. Unfortunately, the answer to that question is not immediately available because investigations on the false positive issue in detecting epistatic loci are surprisingly limited (Storey et al., 2005; Stich et al., 2007; Yang et al., 2007). This could well be because most attention in method development has been focused on increasing the capacity of modelling and the power of detection.

The Bayesian approach (and similar approaches using Markov Chain Monte Carlo algorithms) has been widely adopted to detect epistasis since its first application in 2001 (Sen and Churchill, 2001; Yi and Xu, 2002; Yi et al., 2003; Sung and Wijsman, 2007; Xu and Jia, 2007; Yandell et al., 2007; Yang et al., 2007). Compared with other approaches (for example, regression), the Bayesian approach has some advantages including flexibility in model selection and limited requirement of multiple testing (Yi et al., 2003). However, it also has some disadvantages in computational efficiency (Yi and Shriner, 2008) and repeatability of mapping results.

High-throughput analysis of many sets of data is required to understand the global gene interaction patterns in different species before the epistasis controversy can be really settled (Phillips, 2008). Limited by the daunting computing demand to cover high-dimensional search space and handle multiple tests, the majority of studies have investigated only a small set of phenotypic traits selected on the basis of personal interest. One way to increase the throughput is to use Grid computing resources (Seaton et al., 2006). The other is to develop fast and effective search algorithms (Ljungberg et al., 2004; Pattin et al., 2008). Using pre-identified loci as prior information to detect epistasis in the one-dimensional scale was suggested to reduce the search dimension and, hence, boost speed (Jannink and Jansen, 2001; Carlborg and Haley, 2004; Evans et al., 2006; Kooperberg and Leblanc, 2008). Nevertheless, relying on the pre-identified signals was questionable because that approach would miss out those loci with strong epistatic interactions, but undetectable main effects (Xu and Jia, 2007; Phillips, 2008). The challenge is how to effectively incorporate the one-dimensional scans driven by pre-identified loci into a higher-dimensional search.

This study was aimed to address the two questions of how to control false positives and how to effectively use the information of pre-identified loci in the mapping of epistatic quantitative trait loci (QTL) using large-scale simulations. As the issue of balancing false positive rate (FPR) and power is entangled with multiple testing methods and search algorithms, we adopted the following measures to keep the problem tractable: (a) focusing on pair-wise interactions in an F2 population structure; (b) using the regression approach (Haley and Knott, 1992) to allow easy replication and interpretation; (c) using the conventional epistasis partition given by Jana (1971); (d) using exhaustive search to explore the entire search space and (e) using the nested test framework (Sen and Churchill, 2001) with modification to derive thresholds and perform multiple tests. The objectives are (1) to provide profiles of FPR and power for detecting different forms and levels of epistasis and (2) to find a solution to effectively combine one-dimensional scans with a two-dimensional scan in mapping epistatic loci. Ultimately, we hope to deliver a new application running on the Grid infrastructure to allow high-throughput epistasis analyses and produce results with good quality and known FPR for functional studies.

Materials and methods

Genetic models and epistasis parameterization

The regression method (Haley and Knott, 1992) was extended to map epistatic pairs of QTL (Carlborg and Andersson, 2002; Carapuco et al., 2005). Considering only one pair of loci denoted as L1 and L2, the genetic models have the following simplified forms:

Model 1: y=μ+L1+L2+L1+L2+e (two loci with epistasis)

Model 2: y=μ+L1+L2+e (two loci without epistasis)

Model 3: y=μ+L1+e (single locus model)

Model 4: y=μ+e (null model)

where y is the trait of interest, μ is the model constant and e is the random error term. Additive and dominance genetic effects were modelled for each locus (a1 and d1 for L1; a2 and d2 for L2). The interaction between L1 and L2 (denoted as L1L2) was partitioned as a1 × a2, a1 × d2, d1 × a2 and d1 × d2 genetic components following Jana (1971).

Search algorithm

The search algorithm was developed to use effectively the pre-identified QTL with significant marginal effects (marginal-effect QTL) in detecting epistasis. The marginal-effect QTL are detected through a forwards selection approach (ignoring the possibility of epistasis) and tuned by retesting each QTL iteratively, while fitting the remaining QTL as cofactors until the number of QTL and their positions become stable (Wei et al., 2007). The search algorithm is composed of two separate paths for the identification of pairs of loci with epistatic interactions (Figure 1):

  • 1D_path—a one-dimensional genome scan for each pre-identified marginal-effect QTL searching for interactions with all other genomic positions. There may be several independent one-dimensional genome scans if more than one marginal-effect QTL has been identified.

  • 2D_path—a full two-dimensional genome scan searching for epistatic interactions for loci at all combinations of two positions in the genome irrespective of the positions of pre-identified marginal-effect QTL.

Figure 1
figure 1

Flowchart of the search algorithm for detecting epistatic QTL in two separate paths (1D_path to the left and 2D_path to the right).

Both paths used an exhaustive search at 1 centiMorgan (cM) intervals to find the best pair of loci (that is with the minimum residual variance or maximum aggregate genetic variance) under Model 1 for each pair of chromosomes and test them for epistasis. Genome-wide thresholds were derived in advance and a specific nested test framework was used for each path to identify epistatic pairs (see below).

Modified nested test framework

Sen and Churchill (2001) suggested a nested test framework that included an overall test (Model 1 vs Model 4, 8 degrees of freedom) for the aggregate effect of a pair of loci (that is the sum of the effects of each locus and their interaction) and an interaction test (Model 1 vs Model 2, 4 degrees of freedom) for the interaction component. The key property of this framework is that the interaction test is nested in the overall test; only pairs that passed the overall test would proceed to the interaction test and the two loci in Model 2 must be the same as those in Model 1. It was suggested to use permutation (Churchill and Doerge, 1994) to derive a genome-wide threshold for the overall test, but a nominal, tabulated, threshold for the interaction test (Sen and Churchill, 2001; Sugiyama et al., 2001).

Using the F ratio test statistics for model comparison, the nested test framework was adopted here to determine the significance of epistasis. An epistatic pair is declared if both tests are significant. Some modification was made to apply the framework to the 1D_path in which a marginal-effect QTL was known in advance and fixed in Model 1 when searching for epistasis. The overall test needs to ensure that the aggregate effect of a pair involving the marginal-effect QTL explains significantly more phenotypic variance than the marginal-effect QTL alone. In this case, the overall test is comparing Model 1 against Model 3, where L1 represents the marginal-effect QTL in both models; this test thus takes only 6 degrees of freedom. The interaction test is still comparing Model 1 against Model 2 with L1 representing the marginal-effect QTL in both models and this test takes 4 degrees of freedom as in the 2D_path.

Deriving genome-wide thresholds

Permutation (1000 replicates) was used to derive genome-wide thresholds in advance. The DIRECT algorithm (DIviding RECTangle), a fast global optimization algorithm that finds the optima through systematically dividing the search space into smaller rectangles, was earlier adopted in QTL mapping studies (Ljungberg et al., 2004). The DIRECT algorithm was applied here to perform fast two-dimensional scans in permutations to derive thresholds for the 2D_path. Exhaustive one-dimensional genome scans were performed on permuted data to derive thresholds for each pre-identified marginal-effect QTL in the 1D_path. The procedures are outlined below:

  • The 2D_path: For each permuted dataset a DIRECT search was performed to identify the best pair of loci under Model 1. This was tested against Model 4 to calculate and store the F ratio for the overall test. Model 2 was then fitted with the same positions of L1 and L2 and used to calculate and store the F ratio for the interaction test. The genome-wide thresholds for both the overall and interaction tests were derived separately from the corresponding lists of stored F ratios.

  • For the 1D_path, thresholds were derived for each pre-identified marginal-effect QTL. For each round of permutation, the original QTL probabilities were fitted for the identified QTL; then the one-dimensional genome scan was performed with permuted genotypes to find the L2 giving the minimum residual variance of Model 1; the overall test was performed by comparing the Model 1 against the Model 3 with the QTL fitted to calculate and store an F ratio. The QTL and L2 were then fitted to Model 2 to calculate and store the F ratio for the interaction test. The rest of the procedure was identical to that of the 2D_path.

Simulation design

An F2 population was simulated in which the two founder lines contributed 30 individuals each (50% male). Fifteen founder sires from each line were each mated to a different founder dam from the other line (8 progeny per dam) producing 240 F1 individuals. Thirty F1 sires were randomly chosen and each mated to 3 F1 dams (6 progeny per dam) forming an F2 population with 540 individuals. A genome of 20 chromosomes was simulated with each chromosome carrying 11 microsatellite markers (four alleles with equal allele frequency per marker) evenly spaced at 10 cM with a length of 100 cM giving a total genome length of 2000 cM. The probability of crossovers between markers was generated using Haldane's mapping function.

The total phenotypic variance (Vartot) consisted of four components: additive genetic polygenic variance, non-epistatic marginal-effect QTL variance, epistatic QTL variance and random error variance. To mimic the polygenic effects, 10 bi-allelic loci were simulated with each at 0.5 recombination rate (that is not linked with any of the simulated markers) and allele frequency of 0.5 in the founder lines with additive genetic effects only, accounting for 1% of the Vartot, giving a total polygenic heritability of 10%. All QTL were simulated as bi-allelic and fixed for alternative alleles in the founder lines. Each non-epistatic QTL was assigned additive genetic effects only, accounting for 3% of the Vartot. For each epistatic pair of QTL, the eight genetic effects were simulated and the total genetic variance of the pair (Varpair) and the epistatic variance (Varepi) were calculated using the equations given in Jana (1971). Three groups of simulation scenarios were defined based on the epistatic heritability (hepi2=Varepi/Vartot), rather than the whole-pair heritability (hpair2=Varpair/Vartot). Each scenario included polygenic effects, except when explicitly stated, and was tested with at least 550 replicates.

The first group was non-epistatic in which six simulation scenarios were designed to test how the algorithm controlled FPR under different genetic backgrounds in which epistasis was not present. These scenarios were noGenetic—neither polygenic nor marginal-effect QTL; 0QTL_noEpi—only polygenic, not QTL; 1QTL_noEpi—one marginal-effect QTL; 2QTL_noEpi—two marginal-effect QTL; 5QTL_noEpi—five marginal-effect QTL; 8QTL_noEpi—eight marginal-effect QTL. The total QTL variance was the sum of marginal-effect QTL with a variance of 3% of the Vartot each, for example, 24% in the 8QTL_noEpi scenario. Each marginal-effect QTL was simulated at either 85 or 15 cM on one of the chromosomes of 1, 2, 4, 6, 8, 17, 18 and 20 as appropriate.

The second group was the core of the study in which 20 scenarios were defined to test the algorithm in different combinations of epistasis forms and effect sizes to produce profiles of FPR and power in a simplified condition allowing at most one QTL per chromosome. Each of the 20 scenarios had one non-epistatic marginal-effect QTL positioned at 85 cM on chromosome 1 and one epistatic QTL pair in which the first locus of the epistatic pair was positioned at 15 cM on chromosome 2 and the second at 85 cM on chromosome 3. The epistatic QTL pair was simulated to have one of the six forms of epistasis: complementary, duplicate, dominant, recessive, inhibitory (Jana, 1971, 1972; Carlborg et al., 2000) and interaction only (that is additive × additive, without main effects), with an epistatic heritability (hepi2) of either 2.5, 5 or 7.5% (Table 1).

Table 1 Overview and names of the core simulation scenariosa

The third group of scenarios was used to validate the algorithm in two more complicated conditions: (a) the non-epistatic marginal-effect QTL was placed at 85 cM on chromosome 2 in which the first locus of the epistatic pair was simulated at 15 cM (linked loci); (b) in addition to the simplified condition, a second epistatic pair was simulated (with the same form and effect size of epistasis as the first pair) at 85 cM on chromosome 19 and 15 cM on chromosome 20 (two pair). Each epistatic pair was simulated to have 5% hepi2 with one of the six forms of epistasis giving six scenarios for either the linked-loci or the two-pair condition.

Analysis of simulation results

Power and FPR

For non-epistatic QTL, a QTL was counted as a true positive if mapped to the chromosome in which it was simulated, otherwise it was considered as a false positive. For epistasis, an epistatic pair was regarded as a true positive if both loci each mapped to the chromosome in which they were simulated, otherwise it was considered as a false positive. FPR was calculated per scenario as the percentage of the total number of false positives out of the number of simulation replicates. Power was calculated per non-epistatic QTL or per epistatic pair as the percentage of the total number of detected true positives out of the number of simulation replicates. An average power was given when multiple non-epistatic QTL were present. In the two-pair epistatic scenarios, power was reported for each pair.

Accuracy

The accuracy of locating each simulated non-epistatic QTL or epistatic pair was assessed in two ways: (a) exact (%)—the percentage of the total number occurrence of locating the true positive QTL in the simulated 10 cM marker interval (or the total number of occurrence of locating both loci of the true positive epistatic pair to the corresponding simulated marker intervals) out of the number of simulation replicates and (b) precision (cM)—the average distance between the detected position and the simulated position for each locus of a true positive epistatic pair. An average accuracy was given when multiple non-epistatic QTL were present. The exact (%) accuracy was reported for each pair in the two-pair epistatic scenarios.

Overlap between 2D_path and 1D_path

Overlaps (that is the epistatic pairs detected in both paths) existed because scans conducted in the 1D_path were always covered in a full two-dimensional scan, but tested using different thresholds in the two cases. A QTL zone was defined as a chromosome region around each marginal-effect QTL in which any locus within the zone was regarded as the QTL for sharing similar information. Overlaps can be greatly avoided by skipping any QTL zones searched in the 1D_path from the 2D_path search space. The QTL zone length (for example, 10 cM on either side of a QTL) is critical to determine the amount of overlaps in the 2D_path as well as the overall FPR and power after combining the 1D_path with the 2D_path without QTL zones. Different QTL zone lengths of 10, 20, 30 and 40 cM were compared by re-analysing the simulation results from the 2D_path to determine the most effective length.

Results

Thresholds

The genome-wide thresholds derived for both the 2D_path and 1D_path were fairly consistent across scenarios. For the 2D_path, the averaged genome-wide thresholds were 5.28 and 8.22 (corresponding to a nominal P-value of 2.2 × 10−6 and 1.9 × 10−6) for the overall and interaction tests, respectively. For the 1D_path, the averaged genome-wide thresholds were 4.28 and 5.08 (corresponding to a nominal P-value of 3.2 × 10−4 and 5.1 × 10−4) for the overall and interaction tests, respectively. In both cases, the threshold applied for the interaction tests were dramatically more stringent than the recommended nominal P-values of 0.01 (Sugiyama et al., 2001).

FPR and power profiles of mapping marginal-effect QTL

At least 1100 replicates were simulated for each non-epistatic scenario to provide clear profiles of FPR and power of detecting marginal-effect QTL used for mapping epistatic loci. On the basis of the 5% genome-wide threshold, the FPR, average power, accuracy as well as the number of marginal-effect QTL detected were calculated for each scenario as appropriate (Table 2). The results showed that given a QTL heritability of 3%, there was around 50% power to detect a marginal-effect QTL and about 30% probability to locate it in the right marker interval.

Table 2 False positive rate (FPR), power and accuracy in mapping marginal-effect QTL in the non-epistatic scenariosa

FPR profile of mapping epistatic pairs in the non-epistatic scenarios

For each non-epistatic scenario, the FPR of epistatic pairs was measured for both the 2D_path and 1D_path using the 5% genome-wide thresholds for the overall and interaction tests (Figure 2). It was obvious that the FPR values from the 2D_path were quite consistent across scenarios. The 1D_path FPR, however, increased as the average number of marginal-effect QTL increased and departed from the target 5% dramatically when two or more QTL were detected (Table 2). Considering that one-dimensional genome scans were performed for each marginal-effect QTL independently, the 1D_path thresholds were adjusted in three steps: (1) the 5% genome-wide F ratio threshold was converted to a nominal P-value using appropriate degrees of freedom; (2) the P-value was divided by the number of QTL detected and (3) the corrected P-value was converted back to obtain an adjusted F ratio threshold. After applying the adjusted thresholds, the resulting 1D_path FPR was at the correct level (the 1D_path_corrected series in Figure 2). The adjustment procedure was applied from this point onwards to analyse all the simulation results from the 1D_path.

Figure 2
figure 2

The FPR profiles of mapping epistatic pairs generated from different search paths in different non-epistatic genetic backgrounds and their controls. Along the x axis are six non-epistatic scenarios (from the left) simulating a genetic background with neither polygenic nor marginal-effect QTL effects, polygenic plus either zero, one, two, five or eight marginal-effect QTL effects. Within each non-epistatic scenario, any epistatic QTL pair detected from either the 2D_path (a full two-dimensional genome scan irrespective of pre-identified QTL, using 5% genome-wide thresholds) or the 1D_path (one-dimensional genome scans for each pre-identified QTL, using 5% genome-wide thresholds) counted as false positives. The 1D_path_corrected FPR calculated by testing each false positive pair from the 1D_path against stringent thresholds corrected by the number of pre-identified QTL. The Final FPR calculated by combing the FPR results from the 2D_path and the 1D_path_corrected after skipping any overlaps.

Analysing overlaps between the 1D_path and 2D_path results using different QTL zone lengths (area around a marginal-effect QTL that was not tested for epistasis using the 2D_path) uncovered that increased FPR was a concern in scenarios in which multiple marginal-effect QTL were detected. When skipping QTL zones with a length of 10 cM, the 2D_path FPR reduced from 1.82 to 1.36 in the scenario with five simulated QTL (5QTL_noEpi) and from 3.62 to 2.99 in the scenario with eight simulated QTL (8QTL_noEpi). Increasing the zone length to 20 cM or above made only limited difference in the scenario with eight simulated QTL (further reduced the 2D_path FPR to 2.53) while causing considerable reduction of the 2D_path search space.

The final FPR values for each non-epistatic scenario (the Final series in Figure 2) were calculated by combining the new 1D_path FPR results from using the corrected thresholds with the new 2D_path FPR by skipping QTL zones with a length of 10 cM. These FPR values were below or close to a target level of 5% across scenarios. Thus, this way of integrating results from the 1D_path and 2D_path was applied hereafter to calculate final results.

FPR and power profiles of mapping epistatic pairs in the core scenarios

The FPR and power profiles of mapping epistatic pairs are shown in Table 3 for the 20 core simulation scenarios (Table 1). For each scenario, the FPR and power were calculated for the 1D_path using the corrected 5% genome-wide thresholds (‘1D_path’ columns, Table 3) and the 2D_path using the 5% genome-wide thresholds (‘2D_path’ columns). In addition, the 2D_path results after skipping QTL zones with a length of 10 cM and the final integrated results were calculated for each scenario (‘2D_path no_QTL’ and ‘Final’ columns, respectively).

Table 3 False positive rate (FPR) and power profiles of mapping an epistatic pair in the core scenariosa

The final FPR values varied around the target 5% level across scenarios (Table 3). Skipping QTL zones from the two-dimensional search space made rather small reduction in FPR values, possibly because the 2D_path FPR values were consistently low across scenarios already. Most of the 1D_path FPR values were below 4% showing that the threshold correction worked well in general. There were, however, a few occasions in which the 1D_path FPR values were slightly >5% (thus the final FPR values slightly above 7% in scenarios with inhibitory or dominant epistasis) given only 550 replicates used per scenario.

The power profiles were more complicated (Table 3). Within a model of epistasis, the final power of detecting epitasis increased as the hepi2 increased. At the same level of hepi2, the complementary (both loci were marginal-effect QTL) scenario had the highest power value; followed by the inhibitory, dominant and recessive (one locus was marginal-effect QTL) scenarios; the duplicate (one locus was marginal-effect QTL at a chance of 1 out 16) and interaction-only (neither locus with main effects) scenarios had the lowest power. However, that order was almost reversed if we based the ranking on the whole-pair heritability (that is hpair2): the interaction-only and duplicate scenarios had higher power than the remaining (Figure 3). For example, at 10% hpair2, the interaction-only scenario had about 80% power, whereas the complementary scenario had a power <10%.

Figure 3
figure 3

The power profile of detecting epistatic pairs with different forms of epistasis in the context of the whole-pair heritability (the percentage of phenotypic variance explained by the whole-epistatic pair of QTL). Each marked point corresponding to a level of the epistatic heritability of either 2.5, 5 or 7.5% (the interaction-only form with two extra levels of 10 and 12.5%).

In scenarios with complementary, dominant, recessive or inhibitory epistasis, the 1D_path found nearly every epistatic pairs that could be detected and the 2D_path found essentially the same ones, but fewer of them (Table 3). In scenarios with duplicate epistasis, however, both the 1D_path and 2D_path found around 50% epistatic pairs uniquely (that is there was much less overlap in the 2D_path)—nearly half of those found by the 2D_path were not found by the 1D_path and a third of those found by the 1D_path were not found by the 2D_path. In scenarios with interaction-only epistasis, the 2D_path contributed solely to the final power because the 1D_path had no or rather limited power.

The accuracy in mapping epistasis improved as the hepi2 increased (Table 4). The exact (%) results were in line with the final power results across scenarios, that is the chance of locating both loci into the simulated marker intervals increased as the power increase. The average precision (average distance between the mapped and simulated positions) of each locus of an epistatic pair detected from either the 2D_path or 1D_path was always below 5 cM when hepi2 was 5.0% or 7.5%. Furthermore, the majority of the QTL were mapped within 10 cM of the simulated location implying that they were mapped to either the simulated or the adjacent marker interval. Almost all remaining QTL were mapped within 20 cM of the simulated position. Such results suggested that the FPR and power profiles would remain unchanged even if the definition of a true positive epistatic pair had included a restriction that the two loci each mapped to the correct chromosome and within 20 cM of the simulated position.

Table 4 Accuracy of mapping epistatic pair in the core scenariosa

FPR and power profiles in the validation scenarios

Analyses of the 12 validation scenarios showed that the 2D_path and 1D_path behaved almost identically to the core scenarios under the simplified condition. The final FPR and power results from combining the 1D_path and 2D_path for each validation scenario are displayed in Figure 4. Clearly, the FPR values of each validation scenario were similar to those in Table 3 and controlled around the 5% target level. The power profiles for the linked-loci and the two-pair conditions were very similar to each other and to that for the core scenarios with 5% hepi2. The power of each linked-loci scenario was nearly identical to that of the corresponding core scenario, indicating that the algorithm was robust to handle linked-loci situations. Small power reduction was observed in some two-pair scenarios in contrast to the linked-loci scenarios possibly because of the threshold correction as the number of marginal-effect QTL detected in a two-pair scenario almost doubled and consequently the penalty almost doubled as well.

Figure 4
figure 4

The power and FPR profiles of mapping epistatic pairs in the validation scenarios. Along the x axis from the left are scenarios with 5% complementary, dominant, recessive, inhibitory, duplicate, interaction-only epistasis. Linked loci: two QTL on chromosome 2, but not interacting with each other, but one of them interacting with a locus on chromosome 3; two pairs: two epistatic pairs simulated each with a same form of epistasis and 5% epistatic heritability in which FPR calculated per scenario, but power calculated per pair per scenario and displayed as two-pair 1 and 2.

Discussion

In this large-scale simulation study, we examined different aspects of the issue of false positive detection of epistatic loci. We showed performance differences between the 2D_path and 1D_path in mapping different forms and levels of epistasis and produced clear profiles of FPR and power that were validated in more complicated simulation scenarios. In addition, we found that both the 2D_path and 1D_path had strengths and weaknesses in detecting certain forms of epistasis. The strength in one happened to be the weakness of the other. The two paths can be effectively combined by skipping any QTL zones with a length of 10 cM from the 2D_path search space.

It was shown throughout the study that a good control of FPR was achieved by applying the pre-derived genome-wide thresholds to the nested overall and interaction tests based on the modified nested test framework. In a parallel simulation study using the same simulation scenarios and scales, we evaluated an alternative approach that considered only the interaction test in both searching and testing for epistasis, but found hard to control FPR in general (results can be made available on request). The modification of the nested test framework seemed to be very important in increasing power of detection of epistasis associated with pre-identified QTL by allowing the one-dimensional scans for epistatic pairs as implemented in the 1D_path and improving the overall search speed because one-dimensional scans were faster than a two-dimensional scan, and the two-dimensional search space would be reduced as a result of the skip of the QTL zones. The framework can be applied to other population structures to help control false positive signals of epistasis. Nevertheless, the theoretical foundation of the framework (for example, the expected FPR level in the 1D_path or 2D_path) requires further work.

The trade-off between FPR and power has always been challenging in multiple testing situations (Rice et al., 2008). Considering the difficulty and expense in identifying the functional consequence of a statistical epistasis (Phillips, 2008), a low FPR level is important to encourage biologists engagement in functional epistasis studies. Using the 5% genome-wide thresholds (with correction for the 1D_path), we controlled the FPR close to the 5% conventional level. Using less stringent thresholds, for example, 5% genome-wide threshold for the overall tests, but 10% genome-wide thresholds for the interaction tests, resulted in a slight increase of power, but considerably higher FPR (results not shown) and thus is not recommended.

The F2 population size of 540 is common in normal linkage studies, but could be small for epistasis analysis as suggested by Carlborg et al. (2006). Our results (Figure 3) showed that for a population of 540 F2, the power to detect an epistatic pair explaining 10% phenotypic variance was about 80% if the pair had an interaction-only form (hepi2=10%), 40% if it had a duplicate form (5.0%<hepi2<7.5%), 20% if it had a dominant form (2.5%<hepi2<5.0%) and <10% if it had either recessive or inhibitory or complementary form (hepi2<2.5%). These results suggested that it was relatively easy to find strong epistasis (for example, hepi2>7.5%), but difficult to detect weak ones given the population size. Our observations are roughly in line with earlier studies: four interaction-only pairs detected from an F2 mouse population (size is 510, 166 females) explained about 36% of the total variation in litter size (that is hepi2=9% on average) (Peripato et al., 2004); most epistatic pairs detected for obesity-related traits using 513 F2 mice had a hepi2>2% despite the relatively relaxed thresholds that were used (Stylianou et al., 2006). Increasing the population size is a good option to increase power (Carlborg et al., 2003; Ma et al., 2009). For example, according to our additional simulation results (Supplementary Figure S1), when increasing the population size from 540 to 840, scenarios with 2.5% hepi2 at least doubled the power, and scenarios with 5.0% hepi2 at the population size of 840 had power comparable to those with a hepi2 of 7.5% at the size of 540.

Even at the population size of 540, the accuracy of mapping epistatic pairs, without any backwards or forwards tuning in place yet, was generally good across all epistasis forms in the context of an F2 population and increased as epistasis gets stronger. Those accuracy results suggested that the simple regression approach and the non-orthogonal epistasis partition might be sufficient to correctly map epistatic pairs. Considering that the regression models are easy to extend and fast to compute, the modelling approach we adopted here remains competitive in mapping epistasis.

The whole simulation study cost was >10 CPU years, but was computed in a couple of weeks by distributed computing technologies, so is high throughput in itself. On the basis of the work in this study, the task of high-throughput epistasis analyses becomes achievable at least in populations derived from structured crosses. Skipping the QTL zones in the two-dimensional scan would effectively reduce the search space of the scan and, hence, the overall computing time. By combining the 2D_path and 1D_path, the algorithm is balanced between search efficiency and speed thus suitable to be used as the search engine for high-throughput epistasis analyses. Considering that the genome-wide threshold values were consistent across scenarios at a given genome size, it is possible to derive those thresholds in advance for different genome and population sizes and use them directly to save permutation time in multiple epistasis analyses (Broman et al., 2003).

In summary, using the modified nested test framework to perform nested tests for epistasis in the search process controlled by the combined search algorithm (Figure 1) allowed effectively mapping different forms of epistasis while keeping FPR under control. Once integrated with distributed computing resources, the new distributed application can support high-throughput epistasis analysis.