Introduction

Survivors of pediatric acute lymphoblastic leukemia (ALL) are at risk for long-term deficits in bone mineral density (BMD) due to childhood cancer treatment exposures, including cranial radiation, antimetabolites (e.g., methotrexate), and glucocorticoids [1,2,3]. Cranial radiation diminishes BMD through injury to the hypothalamic–pituitary axis, affecting sex and growth hormone secretions that play an important role in bone metabolism [2]. Methotrexate and glucocorticoids decrease BMD by influencing factors that control osteoblast and osteoclast cell activity [2]. Despite common past treatment exposures, pediatric ALL survivors exhibit substantial variation in BMD later in life. An unexplored explanation for some of this uncharacterized variation in BMD is epistasis, where the effect of a locus on a trait is conditional on genotypes observed at other loci.

While studies have investigated pairs of SNPs in select candidate genes with BMD [4, 5], higher order epistasis involving three or more SNPs is also likely to play a vital role in the genetic architecture of BMD. BMD reflects the cumulative effects of interacting genetic and environmental factors on peak bone mass and bone remodeling [6]. Signaling pathways requiring both spatiotemporal cues and epigenetic modifications of genetic loci guide the differentiation of bone cells from cells of mesenchymal and hematopoietic origin [7]. In a recent genome-wide scan of SNP pair interactions, over half of gene expressions in peripheral blood significantly associated with SNP pairs were influenced by networks involving three SNPs or more [8].

To our knowledge, no studies have explored higher order epistasis and BMD. In general, searches for epistasis are challenged in identifying true interactions between SNPs on a genome-wide scale, largely due to insufficient statistical power. Novel strategies have been applied to increase power and identify reliable interactions. One strategy is to restrict the search for epistasis to SNPs that are likely to contribute to biological interactions, reducing the number of tested interactions [9, 10]. Another strategy is to search for interactions with large effects on phenotypes [8]. Lastly, some epistatic interactions failing to meet conservative genome-wide significance thresholds have been shown to be reliable signals through replication [11].

In this study, we combined all of these strategies to identify higher order epistatic interactions that explain some of the variability of treatment effects on BMD among adult survivors of childhood ALL exposed to BMD-diminishing treatments. We leveraged knowledge that SNPs in interacting enhancer and promoter regions modulate gene expression and thus affect phenotypes [12, 13]. We applied chromatin state annotations [14] to restrict the search for epistasis to SNPs mapped to putative enhancer or promoter regions. To detect interactions between regulatory regions carrying SNPs associated with BMD (hereafter referred to as “SNP interactions”) as potential modifiers of treatment effects, a novel, non-exhaustive statistical algorithm was implemented. Our specific focus was to identify regulatory 3-way SNP interactions associated with BMD in ALL survivors. An independent cohort of cancer survivors was used to replicate candidate regulatory SNP interaction signals as modifiers of treatment effects on BMD. Supplemental bioinformatics analyses were conducted to characterize replicated SNP interactions.

Subjects and methods

Study cohorts

Individuals included in this analysis are participants in the St. Jude Lifetime Cohort Study (SJLIFE) [15]. Eligible survivors were divided into two cohorts: a discovery cohort of 856 adult survivors of pediatric ALL and a replication cohort consisting of 1428 adult survivors of any non-ALL pediatric cancer (a second cohort of ALL survivors with comparable BMD measurements and genotype data was unavailable). BMD was ascertained using quantitative-computed tomography from the mid-bodies of the first and second lumbar vertebra. A BMD Z-score was computed for each survivor by taking the difference between the average of their two vertebral BMD measurements and the age-matched and sex-matched mean of a reference population, divided by the standard deviation in the reference population. Cumulative doses of cranial radiation (none, >0 to <2400, ≥2400 cGy), methotrexate (<5100, ≥5100 to <20000, ≥20000 mg/m2), and glucocorticoid (<2000, ≥2000 to <11000, ≥11000 mg/m2) treatment exposures were considered as risk factors for BMD deficiency among ALL survivors [1,2,3]. We built a multiple linear regression model for BMD Z-scores including sex, categorical treatment exposures, and genetic ancestry estimated using STRUCTURE software [16] (to control for population stratification in our multi-ethnic cohorts) for adjustment in subsequent genetic association analyses. Additional study cohort details are provided in Supplementary Methods.

Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA) was used to genotype DNA samples in SJLIFE. Genotyping quality control thresholds for exclusion from the analysis in the discovery cohort were as follows: <95% per-sample call rate across markers, <95% SNP call rate across samples, MAF <1%, and Hardy–Weinberg equilibrium P < 1 × 10−6. Imputation of missing measured genotypes was completed using BEAGLE [17]. An allelic R2 imputation quality score cutoff of >0.5 was applied. For replication, a per-sample call rate of >90% was used to avoid sample exclusion due to the smaller set of SNPs selected for follow-up. Otherwise, genotyping quality control thresholds were identical for discovery and replication cohorts. Imputation was not employed for replication to limit data uncertainty associated with imputation. Per-SNP missingness rates were comparable between cohorts (Supplementary Table 1). Genetic data is available in the European Genome-phenome Archive under study accession number EGAS00001002645 (https://www.ebi.ac.uk/ega/studies/EGAS00001002645).

Statistical methodology: discovery analysis

ChromHMM chromatin state annotations [14] were used to map SNPs to putative enhancer or promoter regions. We retained SNPs mapped to “strong enhancer” or “active promoter” elements in any of nine ChromHMM-annotated human cell types, since it was unknown which types would be most relevant. This restriction limited the search for SNP interactions to 75523 SNPs. Each of these SNPs was dichotomized to create two binary variables, or indicators for carrying at least one non-reference allele or homozygous non-reference alleles. Any binary-encoded SNP variable with frequency <5% was removed to limit evaluations of sparse 3-SNP interactions. A total of 115800 binary SNP variables were retained genome-wide.

Despite this SNP restriction, exhaustively testing enhancer/promoter 3-SNP interactions would entail >260 trillion tests. To decrease the number of effective tests without compromising the search quality, we developed a non-exhaustive, sequential conditioning algorithm based on logic regression [18] to conduct an effective search of the 3-SNP interaction search space. Briefly, logic regression is an adaptive regression methodology that combines generalized linear models with a stochastic search algorithm to identify best-fitting models that include interaction variables (“logic trees”) comprised of binary predictors. Logic regression selects best-fitting models by comparing decrements in model scores. The search for 3-SNP interactions was also restricted by chromosome, reducing the computational burden to a manageable level.

Our algorithm-identified 3-SNP interactions sequentially via logic regression for each chromosome, using forward addition to form a linear predictor that included ten 3-SNP interaction trees per chromosome. Models for each chromosome took the following form:

$$E[Y]{\mathrm{ = }}\mu {\mathrm{ + }}\delta S{\mathrm{ + }}\mathop {\sum}\nolimits_{j = 1}^3 {\alpha _jA_j + \mathop {\sum}\nolimits_{k = 1}^6 {\gamma _kT_k + \mathop {\sum}\nolimits_{p = 1}^m {\beta _pL_p} } } ,$$
(1)

where Y is BMD Z-score, S is sex, A1–3 are the three STRUCTURE genetic ancestry covariates, T1–6 are indicator variables for the three categorical treatment variables (i.e., three 3-level variables), and Lp are the 3-SNP interaction trees (m = 1, 2, …, 10 trees). By identifying 3-SNP interaction trees conditioned on previously identified trees, the algorithm guides the stochastic search in different directions, yielding 3-SNP interactions that are unlikely to be correlated.

We applied a permutation-based approach to identify candidate 3-SNP interactions for replication follow-up. For each of the algorithm-identified 3-SNP interaction trees, 1000 permutations of BMD Z-scores were used to compute the corresponding empirically derived median for the absolute value of the t-statistic and its median absolution deviation (MAD, a robust measure of variability). Permutations of BMD Z-score values were conditioned on 50 quantiles of the fitted BMD Z-score from the clinical baseline model to approximately preserve relationships between adjustment covariates and BMD Z-score. The search algorithm was applied to these conditionally permuted BMD Z-scores in the exact same manner as the unpermuted case. To select candidate 3-SNP interactions for replication follow-up, we compared the observed t-statistic of a given tree with the corresponding empirically derived median, similar to the Significance Analysis of Microarray method [19]. Our evaluation statistic is \(\frac{{t_{\rm {obs}} - t_{\rm {med}}}}{{\rm {MAD}_{t_{\rm {med}}}}}\), where tobs is the absolute value of the t-statistic for the pth tree given (p−1) observed trees, and tmed and \({\rm MAD}_{t_{\rm med}}\) are the median and MAD, respectively, of the corresponding 1000 conditioned t-statistics from 1000 permutations. If the evaluation statistic was >2 (i.e., tobs was >2 median absolute deviations away from its corresponding permutation-derived median), the 3-SNP interaction tree was selected as a candidate interaction for replication, as the tree explains an unusually large proportion of BMD Z-score variation than expected by chance alone.

After selecting candidate regulatory 3-SNP interaction trees (“original” trees), a “neighborhood” analysis was conducted to identify “proximal” 3-SNP interaction trees with stronger associations with BMD Z-score than original trees. The reasons for conducting this analysis were twofold: (1) our non-exhaustive logic regression-based algorithm may have missed proximal SNP interactions with stronger associations with BMD; and (2) these strongly associated neighborhood SNP interactions may include SNPs that “tag” additional regulatory regions relevant for BMD. Neighborhood trees were constructed with binary-encoded SNP variables (same filtering criteria as the discovery analysis) from SNPs ±100 kb of SNPs in the original tree, with the same Boolean logic structure as the corresponding original tree. Neighborhood trees selected for follow-up in the replication cohort explained larger proportions of BMD Z-score variation than their corresponding original trees.

Statistical methodology: replication analysis

Since every participant in the discovery cohort received substantial cumulative doses of at least one of the three treatments known to affect BMD, we expected that interaction signals observed in the discovery cohort were potential modifiers of treatment effects on BMD. We therefore defined evidence of replication as significant modification of treatment effects by 3-SNP trees in the replication cohort. We assessed modification of treatment effects using two different approaches: (1) 3-SNP tree interactions with each of the three treatments, and (2) 3-SNP tree main effects among those exposed to each of the three treatments. If the 3-SNP tree had a significant interaction (P < 0.05) with at least one of the three treatments or a significant main effect (P < 0.05) among those exposed to one of the treatments, we deemed the interaction to be replicated. We further required treatment modification effects in the replication cohort to have the same direction and similar magnitude as the discovery cohort.

Additional explanation of the statistical methodology is given in Supplementary Methods.

Comparison of the proposed method to a benchmark 2-SNP interaction analysis method

We conducted an exhaustive, within-chromosome 2-way SNP interaction analysis among enhancer/promoter SNPs with the linear regression-based epistasis module in PLINK v1.90, a benchmark methodology for epistasis analysis [20]. We also performed a simulation study under three sample size scenarios (N = 1000, 1500, and 2000) to compare the performance, measured by power and positive predictive value (PPV), of our proposed method and the benchmark method’s detection of component SNP pairs for replicated 3-SNP interactions. Details for both analyses are provided in Supplementary Methods.

Biological characterization of replicated interactions

We evaluated whether there was an excess of significant gene expressions (expression quantitative trait loci or eQTLs) for SNPs in replicated 3-SNP interactions in bone-related cells/tissues using cis-eQTLs achieving study-wide significance from the Genotype-Tissue Expression (GTEx) Project [21] and GHS-Express monocyte transcriptome [22] databases. Using the BMD biology literature, we defined 16 cell or tissue groups to be related to bone out of 45 available cell/tissue groups. Counts of significant eQTLs in bone-related cells/tissues for SNPs of interest were compared to all other SNPs genome-wide with at least one significant eQTL in these databases (~2.6 million SNPs with ~26.4 million eQTLs) using a 2-sided Fisher’s exact test.

To investigate the cell- and tissue-specificity of enhancer and promoter states for SNPs contributing to replicated interactions, we conducted enrichment analyses using the 15-state chromatin state annotation data for 127 consolidated human cell types from the Roadmap Epigenomics Mapping Consortium (REMC) [23]. For each cell type, we compared the set of SNPs in replicated interactions with the set of non-overlapping SNPs originally mapped to enhancers/promoters. Frequencies of overlap between SNPs in each set and REMC enhancer or promoter regions were counted in each cell type. Strength of evidence for enrichments was evaluated using a 2-sided Fisher’s exact test.

Assays based on chromosome conformation capture (3C) enable study of physical interactions between chromatin regions [12, 13]. We evaluated the likelihood of physical interaction between SNP regions participating in replicated 3-SNP interactions using a publicly available Hi-C data library generated in lymphoblastoid cells [24], visualized with the WashU EpiGenome Browser resource [25].

Details for bioinformatics analyses are available in Supplementary Methods.

Results

The discovery cohort included 856 adult survivors of pediatric ALL. Cohort clinical characteristics are provided in Table 1. Every ALL survivor was exposed to cranial radiation therapy (CRT), methotrexate, and/or glucocorticoids during childhood. Our linear regression model with sex, ancestry, and treatment covariates demonstrated that decreases in adjusted mean BMD Z-scores were significantly associated with increasing cumulative dosages for each of these treatments (Supplementary Table 2).

Table 1 Participant characteristics

Using the proposed logic regression-based algorithm, we identified 220 3-SNP interactions (10 interactions per chromosome) associated with BMD Z-score. Consistent with previous observations of regulatory complexes involving enhancer–promoter, enhancer–enhancer, or promoter–promoter interactions [26], no restrictions were made on the composition of 3-SNP interactions. Six distinct (uncorrelated) 3-SNP interactions were selected as candidate interactions for replication follow-up using our permutation-based evaluation statistic threshold (values >2). We considered each of these six distinct 3-SNP interactions separately as genomic “interaction neighborhoods” associated with BMD and looked for other 3-SNP interactions in these “neighborhoods” that were more strongly associated with BMD than the original 3-SNP interactions in the discovery cohort. All 3-way SNP interactions using any SNP located within 100-kb of regulatory loci contributing to the originally selected 3-SNP interactions were assessed. We identified ten additional “neighborhood” 3-SNP interactions that explained larger proportions of BMD Z-score variation than their corresponding original interactions for four of the six selected 3-SNP interactions: this yielded a total of 16 candidate 3-SNP interactions for replication follow-up.

The replication cohort of SJLIFE participants (N = 1428) with a range of non-ALL pediatric cancer diagnoses (Supplementary Table 3) was comparable to the discovery cohort with respect to age, sex, and ancestry distributions (Table 1). Participants in the replication cohort exposed to either CRT or methotrexate received, on average, higher cumulative doses of these treatments compared to the discovery cohort (Supplementary Table 4). Applying our replication definition, 12 of the 16 3-SNP interactions were replicated as modifiers of treatment effects (Supplementary Tables 57). Considering the six originally selected 3-SNP interactions, each reflecting a distinct interaction neighborhood, at least one original or neighborhood 3-SNP interaction candidate was replicated for five of the six selected 3-SNP interaction neighborhoods.

Table 2 shows the best replicated original or neighborhood 3-SNP interaction (defined by replication P-value) detected among the five genomic neighborhoods with replicated interactions. Adjusted changes in mean BMD Z-scores for these five best replicated 3-SNP interactions in the discovery cohort ranged from −1.30 to + 1.77 SD, with regression coefficient t test-based (naive) P-values ranging from 2.9 × 10−13 to 3.5 × 10−11. Four of these 3-SNP interactions included at least one SNP that was not nominally significant. No component SNP pair fully recovered the entire magnitude of association of its respective 3-SNP interaction. In the discovery cohort, the breakdown of the proportions of variance in BMD Z-score explained by the non-genetic covariates (14.5%) and the five best replicated 3-SNP interactions (14.1%) were comparable (Supplementary Table 8).

Table 2 Replicated 3-SNP interactions associated with BMD Z-score identified by the novel logic regression-based algorithm

To compare the performance of our proposed algorithm to a benchmark SNP interaction association testing method, we conducted an exhaustive, within-chromosome pairwise SNP interaction analysis using the 75523 SNPs mapped to putative regulatory regions. Of the nearly 158 million SNP pair combinations considered, seven pairs achieved genome-wide significance (Bonferroni-adjusted P < 3.2 × 10−10). None were contributing pairs to any of the 220 3-SNP interactions detected with our search algorithm. Considering all SNP pair results with P < 1.0 × 10−9 and the SNP pairs formed by their LD proxy SNPs, none of the 967 original or LD proxy SNP pairs were contributing pairs for any of the 220 3-SNP interactions (Supplementary Table 9). To further distinguish differences in performance between our novel method and the benchmark SNP pair testing method, we conducted a simulation study. Assuming effect sizes observed in our discovery analysis (Table 2), our proposed method has 18–60% power and 17–49% PPV to detect “true” (replicated) 3-SNP interactions in smaller samples (N = 1000), with marked improvements in both statistics with modest increments in sample size (Supplementary Table 10). In comparison, the benchmark SNP pair method is appreciably less powerful and has low PPV for detecting component 2-SNP interactions in underlying true 3-SNP interactions, even with larger sample sizes and under a liberal P-value threshold (P < 1 × 10−5) to select top SNP pairs (Supplementary Table 11).

The overall biological plausibility of association with BMD was assessed for the set of 22 unique SNPs contributing to the 12 replicated original and neighborhood interactions. First, we examined gene expression data, specifically eQTL associations achieving study-wide significance in GTEx Project [21] and GHS-Express [22] databases. Our 22-SNP set had a total of 51 significant eQTLs in 17 cells/tissues, of which 40 were observed among 16 cell/tissue types related to bone (enrichment P = 3.6 × 10−4, relative to the set of non-overlapping SNPs genome-wide with at least one significant eQTL in any of the 45 queried cell/tissue types) (Fig. 1a; Supplementary Tables 12, 13). Second, we used REMC chromatin state annotation data [23] to examine whether our 22-SNP set was enriched in enhancer or promoter states in each of 127 consolidated cell/tissue groups. We observed suggestive enrichment in overlap between SNPs in our 22-SNP set and putative enhancer states in four cell types relevant to bone biology (P < 0.05, no Bonferroni adjustment), relative to a background set of 75508 non-overlapping enhancer/promoter SNPs in our original SNP restriction set (Fig. 1b). Consideration of weakly significant enhancer and promoter enrichment analysis results (P < 0.10, no Bonferroni adjustment; Supplementary Tables 14, 15) suggests the 22-SNP set is relatively enriched for both regulatory states in monocytes and hematopoietic stem cells, which are related to bone metabolism [7]. For each of the distinct replicated 3-SNP interactions, chromatin contacts between putative regulatory regions containing the three SNPs of interest appeared supported: at least two chromatin contacts connecting the three target loci were observed, each with proximity scores ≥2 (Supplementary Table 16; Supplementary Figures 1720).

Fig. 1
figure 1

Biological plausibility of association between identified SNPs in replicated SNP interactions with BMD a Counts of significant gene expressions (eQTLs) for the 22 unique SNPs in replicated 3-SNP interactions, grouped by the 16 cell or tissue types related to bone (above); the corresponding enrichment analysis result using ~2.6 million non-overlapping genome-wide SNPs with ~26.4 million eQTLs for comparison (below). b Plot of Fisher’s exact test P-values (log10(P)) from enhancer (left) and promoter (right) enrichment state analyses for the 22 unique SNPs in replicated 3-SNP interactions, using Roadmap Epigenomics Mapping Consortium chromatin state annotations for two BMD-related human cell categories (musculoskeletal (MS), blood) and two comparison categories (gastrointestinal (GI), brain). Dashed lines correspond to P < 0.05

The 3-SNP interaction with the strongest evidence of association with BMD was observed between rs1020745 (hg19 chr12:g.53692955G >A; PFDN5 intronic and C12orf10 promoter region), rs2110167 (hg19 chr12:g.5734319A >G; ANO2, intronic region), and rs10444471 (hg19 chr12:g.4677211G >T; DYRK4 synonymous coding variant) with an adjusted mean increase in BMD Z-score of 1.72 SD (95% CI: 1.27, 2.17). Both rs10444471 and rs2110167 were more frequently observed in enhancer states in bone-related cell types, whereas rs1020745 overlapped both enhancer and promoter states with relatively high frequencies (Table 3). Hi-C chromatin interaction maps in lymphoblastoid cells connecting the 3-SNP regions showed contact selectivity for the rs1020745 locus, with proximity scores indicating nearly 13-fold interaction enrichment with the rs10444471 locus, and over sixfold interaction enrichment with the rs2110167 locus. Enhancer regions including rs10444471 and rs2110167 may interact distally with a promoter or enhancer region bearing rs1020745, in cell types known to play a role in osteoblast or osteoclast differentiation (Fig. 2). Notably, the rs1020745 locus is known to reside in a region of high linkage disequilibrium [27], implicating several potential gene targets including SP7.

Table 3 Annotations of replicated regulatory 3-SNP interaction trees
Fig. 2
figure 2

Chromatin interactions for the chromosome 12 SNP interaction: (rs1020745 = {AG,GG} and (rs2110167 = {GA,AA} and rs10444471 = {GG})). The WashU EpiGenome Browser was used to visualize long-range chromatin interactions within and across three 500-kb windows centered at implicated SNPs. SNP locations are contextualized using ideograms at the top of regional windows and highlighted with vertical lines in the center of each window. Histone modification (H3K4me3, H3K4me1, H3K27ac), RNA-seq, and DNase I hypersensitivity heatmap data tracks were reviewed. Four data tracks per assay for each of four cell/tissue samples are shown: lymphoblastoid cells (LCLs), peripheral blood mononuclear cells or monocytes, mobilized CD34 cells, and osteoblasts or an osteoblastic precursor proxy (H1 mesenchymal cells). Hi-C data generated with GM06990 LCLs was used to assess evidence for long-range chromatin interactions between SNPs in 3-SNP interaction trees (100-kb bin resolution, log2[observed contact/expected contact] scores). Minimum Hi-C interaction scores were set such that interaction arcs represent chromatin interactions with at least + 4-fold observed contact frequency over expected (scores >2)

Discussion

Previous studies of epistasis have successfully used exhaustive testing methods to assess SNP pair interactions. To detect 3-SNP interactions associated with a complex trait on a genome-wide scale, we implemented a novel, non-exhaustive logic regression-based algorithm among SNPs mapped to regulatory genomic regions. Specifically, our algorithm: (a) focuses on 3-way interactions that plausibly reflect gene regulation events using SNPs mapped to enhancers or promoters; and (b) considers many epistatic candidates, but only allocates 1 degree-of-freedom for a 3-SNP interaction. The strength of our method is that we use logic regression combined with a conditioning strategy to encourage a multi-directional, stochastic search, bypassing an exhaustive search for 3-way interactions that may miss a true interaction due to lack of statistical power.

Despite known limitations of logic regression (e.g., non-exhaustive searches may miss the “best” interaction solution), we propose our method as a complementary approach to existing exhaustive 2-SNP search methods to detect higher order epistasis. We observed no overlap between top 2-way regulatory SNP interactions identified using a benchmark exhaustive testing method and 3-way regulatory SNP interactions detected with our proposed method. Furthermore, our simulation results revealed that SNP pair searches are ineffective for detecting 3-SNP interaction patterns associated with variations in BMD, unless component 2-SNP interactions have strong associations with phenotype without the inclusion of an additional SNP. These results suggest exhaustive searches for 2-SNP interactions are not universally effective for detecting higher order epistasis, and novel methods to conduct deliberate searches for higher order epistasis are needed.

To safeguard against the reporting of false positive results, we used a permutation-based evaluation statistic to identify candidate 3-SNP interactions, performed a replication analysis, and conducted additional bioinformatics analyses. We identified six regulatory 3-SNP interactions that potentially modify treatment effects on BMD among adult survivors of pediatric ALL. Five of these 3-SNP interactions were replicated as treatment modification effects in an independent sample. Our bioinformatics analyses indicated that SNPs contributing to replicated interactions had both an excess of gene expressions and an enrichment of enhancer states in cell and tissue types important for bone biology. The plausibility of interactions between regulatory regions bearing target SNP variants was supported by observations of chromatin contacts that occurred in greater frequencies than expected between regions that overlapped SNPs of interest in lymphoblastoid cells. Although these 3-SNP interactions were not functionally validated, our findings represent viable leads in identifying epistatic interactions with cancer treatment-related effects on BMD.

There are multiple ways to interpret these 3-SNP interactions. Given the long-range chromatin interaction data, it is plausible that epistatic networks consisting of three SNPs embedded in regulatory regions that physically interact jointly affect gene expressions that modify BMD in pediatric cancer survivors exposed to specific cytotoxic treatments. For example, among those exposed to methotrexate, the genomic regulatory region bearing rs1020745 could act as a “hub” for the 3-way chromosome 12 genetic interaction, with rs2110167 and rs10444471 acting as supportive regulatory elements to influence the SP7 locus (rs1020745). SP7 has previously been reported as a candidate gene affecting bone biology in both adult and pediatric populations [27, 28], and is known to encode an osteogenic transcription factor, Osterix (Osx) [29].

Although a second independent cohort of ALL survivors would be desirable for replication analyses, the availability of a replication cohort of non-ALL survivors, which consisted predominantly of survivors of solid tumors or lymphoma, provided the opportunity to assess whether genetic interactions associated with BMD Z-score in the discovery cohort plausibly modified cancer treatment effects on BMD. Our replication results support the discovery findings and underscore the relative importance of treatment exposures, as these epistatic interactions do not appear to be pathological artifacts specific to ALL. To contextualize these treatment effect modifications, consider the chromosome 12 interaction. This putative epistatic interaction may modulate the effects of SP7 and as a consequence, Osx expression levels. Exposure to methotrexate has been linked to decreased Osx expression and significant reductions in osteocyte precursor cells and metaphyseal trabecular bone volume in rats [30]. As such, this interaction may counter BMD loss in cancer survivors exposed to methotrexate.

In conclusion, our results demonstrate the feasibility of detecting and replicating higher order interactions between SNPs within putative regulatory regions associated with a complex quantitative trait, using a hypothesis-driven approach. Similar searches can be implemented in other contexts, using known biological interaction mechanisms. Although power to assess larger nth-order interactions decreases as the number of participatory SNPs increases, biologically motivated searches for SNP interaction networks involving more than three SNPs at a time are warranted.