Gene–gene interaction detection with deep learning

The extent to which genetic interactions affect observed phenotypes is generally unknown because current interaction detection approaches only consider simple interactions between top SNPs of genes. We introduce an open-source framework for increasing the power of interaction detection by considering all SNPs within a selected set of genes and complex interactions between them, beyond only the currently considered multiplicative relationships. In brief, the relation between SNPs and a phenotype is captured by a neural network, and the interactions are quantified by Shapley scores between hidden nodes, which are gene representations that optimally combine information from the corresponding SNPs. Additionally, we design a permutation procedure tailored for neural networks to assess the significance of interactions, which outperformed existing alternatives on simulated datasets with complex interactions, and in a cholesterol study on the UK Biobank it detected nine interactions which replicated on an independent FINRISK dataset.


Supplementary Note 1 Correlation between genes
Shapley values, including Shapley interaction scores, perform poorly on correlated features. Fortunately, the gene representations learned from different SNPs sets are not correlated, because the genes are usually far from each other. In Supplementary Figure 1 (left) and correlation between 4,322 SNPs (right) on UK Biobank experiments. We observe that most genes are not correlated with an averaged absolute correlation to be 0.007. Moreover, we did not observe much high correlation between selected SNPs (averaged absolute correlation is 0.017), because of the LD pruning procedure.

Supplementary Note 2 Additional simulation studies with complex interactions
Supplementary Note 2.1 Compare the power and specificity with different p-value thresholds In Supplementary Figure 2, we compare the power and specificity of different methods on simulated datasets. We regard interactions are positive if their nominal p-value from permutation is less than 0.02 and 0.1. We notice that by increasing the p-value threshold from 0.02 to 0.1, the power of each method increases, while the corresponding specificity decreases. For NNs, Perm T and Perm R have very low specificity but power 1 in most settings, as they severely underestimate the interaction scores under the null hypothesis and consider all pairwise interactions positive. We observe that although NNs have a specificity similar to or slightly smaller than the other methods, the power of detecting complex interactions is much higher.

Supplementary Note 2.2 Simulation studies with only parts of genes interacting
Here we extend the experiments in Section 2.2.1 by constructing simulation datasets where only a small set of genes have interaction effects but all genes have main effects. We consider two following simulators where only 2 genes and 5 genes out of 10 genes have interaction effects, such that where only the maximum interaction between g 7 and g 9 is retained, and, α ij x ij , ∀i ∈ {1, . . . , 10}, y = w 79 max{g 7 , g 9 } + w 48 (g 4 − g 8 ) 2 + w 810 g 8 g 10 + 10 k w k g k + , where we keep one interaction for each type, and 5 genes are interacting.  Figure 3: Comparison of gene-gene interaction detection methods in terms of AP and AUROC on datasets with complex interactions. We simulate data using S/N = 0.1, data size = 80k, prop causal SNPs = 50%, M/I = 1.0, and three different number of interacting genes: 2, 5, and 10 (simulator shown in the main text). The NN approaches (black bar) are consistently better than existing approaches. Figure 3, we observe that NNs can detect interactions even better than existing methods when only a small number of genes are interacting in the simulator.

Supplementary Note 2.3 Simulation studies with more genes
Here we scale up the experiments in Section 2.2.1 by considering 50 and 100 genes instead of only 10. We consider the following simulator: with A = 5 and A = 10. In the above simulator, all genes have interaction effects, and the number of interactions are scaled up according to the number of genes.
From Supplementary Figure 4, we observe that as the number of genes of the simulator increases, the performance of all methods drops. This is because the total signal-to-noise ratio is fixed and the effect of each interaction is scaled down when more genes and interactions are included in the simulator. NNs can detect interactions even better than existing methods when the number of genes is high.

Supplementary Note 2.6 Ablation studies
Going from the classical top-SNP regression to the proposed deep learning approach, there are two additional components: 1. h gi (X i ; w gi ) (in Equation 6a of the main text) that learns the representation of gene i from all of its SNPs X i instead of using the single most correlated SNP as in the top-SNP regression; 2. h p (g 1 , . . . , g M ; w p ) (in Equation 6b of the main text) that can model interactions with any functional form instead of using multiplicative interactions as in the top-SNP regression. In this section, we test the effect of each of these two components with the following two models: 1. NN SNP, which represents each gene with the top SNP but models interactions with h p (·). Therefore, NN SNP removes component 1 but keeps component 2 of the proposed deep learning approach; 2. LR gene, which uses h gi (X i ; w gi ) to learn the representation of each gene from the corresponding SNPs but applies linear regression with multiplicative interactions between gene representations to detect interactions. Therefore, gene LR removes component 2 but keeps component 1 of the proposed deep learning approach. We simulate the data using Equation 1 in the main text (with complex interactions), with S/N = 0.1, data size = 80k, M/I = 1.0, and two different proportions of causal SNPs, 0.5 and 0.1, respectively.

Supplementary
We first observe that NN SNP, which removes the gene representation learning part but keeps the complex interaction modeling part, does not work correctly in general, similar to the existing XGBoost SNP approach. We hypothesize that complex interaction models, such as NN and XGBoost, are not good at modeling Binomial distributed SNPs with tiny variances. However, the complex interaction model works properly on gene representations (e.g., in the proposed NN) which are relatively continuous.
We then observe that LR Gene, which removes the complex interaction modeling part but keeps the gene representation learning part, outperforms LR SNP, especially when more SNPs are causal in the simulator (e.g., when the proportion of causal SNPs equals 0.5). Moreover, modeling complex interactions on gene representations (from LR Gene to NN) consistently improves interaction detection in both settings because multiplication is insufficient to model the complex interactions in the simulator.

Supplementary Note 3.1 Simulation studies with only parts of genes interacting
Similarly with experiments on simulation data with complex interactions, here we extend the experiments on data with simple interactions (Section 2.2.2) by considering only 2 genes and 5 genes out of 10 genes have simple interaction effects, such that where only the multiplicative interaction between SNPs x 7t7 and x 9t9 is retained, and, where we keep 3 interactions and 5 interacting genes.

Supplementary Note 3.2 Simulation studies with more genes
Here we scale up the experiments in Section 2.2.2 by considering 50 and 100 genes instead of only 10. We consider the following simulator: with A = 5 and A = 10. In the above simulator, all genes have interaction effects, and the number of interactions are scaled up according to the number of genes. Similar with Supplementary Figure 4, we observe that as the number of genes of the simulator increases, the performance of all methods drops in Supplementary Figure 18, because the total signal-to-noise ratio is fixed and the effect of each interaction decreases when more genes and interactions are included in the simulator. LR with top-SNP methods are better than others especially when number of genes is huge and interactions between genes are simple.

Supplementary Note 4 Additional simulation studies without interactions Supplementary Note 4.1 Calibration plot of existing permutation methods
We show the calibration plot of current permutation approaches (Perm T and Perm R) in Supplementary Figure  19. We observe that both permutation approaches cannot generate calibrated null distributions.  We notice that although using neural networks can improve the variance explained, the sizes of increases from Gene LR to NN are generally small. However, the increases from top-SNP LR to NN are large. This indicates that the benefits of NNs are mainly from using all possible SNPs from each gene to learn the gene representation. In Supplementary Figure 24, we train a bivariate LR with a multiplicative interaction term for each interaction pair, where the gene is represented by the corresponding top SNP. We collect the negative log p-value (y axis) of the multiplicative interaction term of linear regression and the corresponding gene-gene interaction score (x axis) ofrom NNs for each interaction terms. We apply two FDR (false discovery rate) thresholds: 0.1 and 0.3. We observe that NNs can find very different interactions from the top-SNP approach, and for a certainty threshold, the number of findings from NNs is smaller than top-SNP approach. This indicate that most interactions in real-world applications (e.g., UK Biobank dataset) can be approximated well by multiplications between top SNPs, and then that is the best thing to do. Therefore the NN approach should not be used as a substitute for the default top-SNP approach, but rather as a complementary tool to find interactions that would be missed otherwise.  Figure 24: Compare the findings of top-SNP approach and NNs approach with same FDR thresholds for each phenotype. Each red dot represents one interaction pair, and the corresponding x and y coordinates are the interaction score from NNs and the negative log p-value of top-SNP regression respectively. We notice that NNs can find very different interactions from the default top-SNP regression approach, and the number of findings are smaller than top-SNP approach with the same FDR threshold.