It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
At a glance
- How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78 (2007). , , &
- Inferring cellular networks—a review. BMC Bioinformatics 8, S5 (2007). &
- Gene regulatory network inference: data integration in dynamic models—a review. Biosystems 96, 86–103 (2009). , , , &
- Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717–729 (2010). &
- Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107, 6286–6291 (2010). et al.
- Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief. Bioinform. 15, 195–211 (2014). , , &
- Differential network biology. Mol. Syst. Biol. 8, 565 (2012). &
- From 'differential expression' to 'differential networking'—identification of dysfunctional regulatory networks in diseases. Trends Genet. 26, 326–333 (2010).
- Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28, 2804–2810 (2012). et al.
- Comparing signaling networks between normal and transformed hepatocytes using discrete logical models. Cancer Res. 71, 5400–5411 (2011). et al.
- Perturbation biology: inferring signaling networks in cellular systems. PLoS Comput. Biol. 9, e1003290 (2013). et al.
- Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009). et al.
- A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014). et al.
- Integrating literature-constrained and data-driven inference of signalling networks. Bioinformatics 28, 2311–2317 (2012). , , , &
- (Cambridge Univ. Press, 2009).
- Are there algorithms that discover causal structure? Synthese 121, 29–54 (1999). &
- Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 19, 2271–2282 (2003).
- Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000). , , &
- Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529 (2005). , &
- Causation, Prediction, and Search 2nd edn. (MIT Press, 2000). , &
- A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172–181 (2009). et al.
- Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). et al.
- Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Ann. NY Acad. Sci. 1115, 1–22 (2007). , &
- Lessons from the DREAM2 challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009). , &
- Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010). et al.
- Crowdsourcing network inference: the DREAM predictive signaling network challenge. Sci. Signal. 4, mr7 (2011). , , , &
- Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014). et al.
- Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol. Cancer Ther. 5, 2512–2521 (2006). et al.
- Ischemia in tumors induces early and sustained phosphorylation changes in stress kinase pathways but does not affect global protein levels. Mol. Cell. Proteomics 13, 1690–1704 (2014). et al.
- Developing predictive molecular maps of human disease through community-based modeling. Nat. Genet. 44, 127–130 (2012). et al.
- bioRxiv doi:10.1101/039636 (2016). et al. Context-specificity in causal signaling networks revealed by phosphoprotein profiling.
- The relationship between Precision-Recall and ROC curves. in Proc. 23rd International Conference on Machine Learning 233–240 (ACM, 2006). &
- A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014). et al.
- Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013). et al.
- Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2011). et al.
- Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3, 29–33 (2011). &
- AZD8055 is a potent, selective, and orally bioavailable ATP-competitive mammalian target of rapamycin kinase inhibitor with in vitro and in vivo antitumor activity. Cancer Res. 70, 288–298 (2010). et al.
- Predicting causal effects in large-scale systems from observational data. Nat. Methods 7, 247–248 (2010). , , &
- Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 103, 329–336 (2014). et al.
- Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). et al.
- A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006). et al.
- Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012). et al.
- The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). et al.
- A technical assessment of the utility of reverse phase protein arrays for the study of the functional proteome in non-microdissected human breast cancers. Clin. Proteomics 6, 129–151 (2010). et al.
- Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015). et al.
- The Cyni framework for network inference in Cytoscape. Bioinformatics 31, 1499–1501 (2015). , , , &
- Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93, 491–507 (2006). , &
- DREAMTools: a Python package for scoring collaborative challenges. F1000Research 4, 1030 (2015). et al.
- Supplementary Figure 1: The HPN-DREAM network inference challenge: overview of in silico data tasks. (212 KB)
Data were generated from a nonlinear dynamical model of the ErbB signaling pathway (Chen et al., 2009). Training data consisted of time-courses for 20 network nodes under three inhibitors targeting specific nodes, or no inhibitor, and under two ligand stimuli, applied individually and in combination at two concentrations. In total there were 20 different (inhibitor, stimulus) conditions as shown (top right). Time-courses comprised 11 time points and three technical replicates were provided. Node names were anonymized to prevent use of biological prior information. The sub-challenge 1 in silico data task (SC1B) asked participants to infer a single directed, weighted network using the training data. The aim of the sub-challenge 2 in silico data task (SC2B) was to predict stimulus-specific time-courses under unseen interventions. For SC1B, submissions were assessed against a gold-standard network extracted from the data-generating model, with agreement quantified using AUROC score. For SC2B, predicted time-courses were assessed using held-out test data obtained under in silico inhibition of each network node in turn, with prediction accuracy quantified using root mean square error (RMSE). See Online Methods for further details of the in silico data tasks.
Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).
- Supplementary Figure 2: Context-specific ‘gold-standard’ causal descendant sets for the network inference sub-challenge experimental data task (SC1A). (89 KB)
Context-specific networks submitted to SC1A were assessed using held-out test data, obtained under inhibition of mTOR. Each column in the heatmap indicates, for a given (cell line, stimulus) context c, the phosphoproteins that showed salient changes under mTOR inhibition relative to DMSO control (black cells) and those that did not (white cells). Such changes were determined from the test data using a procedure centered around a paired t-test. Phosphoproteins that show salient changes can be regarded as descendants of mTOR in the underlying causal signaling network. Columns therefore represent context-specific experimentally-determined sets of causal descendants of mTOR, DcGS, and were used as a ‘gold-standard’ to assess inferred context-specific networks. Further details regarding the determination of the gold-standard descendant sets and the scoring procedure can be found in Online Methods. Missing data is indicated by gray cells (some phosphoprotein antibodies were only present in the (training and test) data for a subset of cell lines). Based on a figure in Hill, Nesser et al. (2016).
Hill, S.M., Nesser, N.K. et al. Context-specificity in causal signaling networks revealed by phosphoprotein profiling. bioRxiv doi:10.1101/039636 (2016).
- Supplementary Figure 3: Network inference sub-challenge (SC1) final team scores and rankings. (96 KB)
(a) Mean rank scores for the 74 teams that participated in the experimental data task (SC1A). Mean rank scores were used to obtain final team rankings. For the 40 teams that provided information regarding their approach, bar color indicates method type (see also Fig. 3e, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate teams with statistically significant AUROC scores (FDR < 0.05) in at least 50% of (cell line, stimulus) contexts (2 stars) or at least 25% of contexts (1 star) (multiple testing correction performed within each context with respect to number of teams). (b) AUROC scores for the 65 teams that participated in the in silico data task (SC1B). AUROC scores were used to obtain final team rankings. As in a, color indicates method type (see also Fig. 3f, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate statistically significant AUROC scores (FDR < 0.05). (c) Comparison of mean rank and mean AUROC scores for SC1A. (d) Final ranks for SC1A (dashed blue line) and SC1B (dotted green line) were averaged to obtain a combined score (solid red line) for the 59 teams that participated in both tasks. Teams ordered by combined score (see “SC1A/B combined final rank” column in Supplementary Table 2). See Online Methods for full details of scoring for SC1.
- Supplementary Figure 4: Gold-standard causal network for the network inference sub-challenge in silico data task (SC1B). (139 KB)
The gold-standard network, used to assess networks submitted to SC1B, was obtained from a data-generating dynamical model of the ErbB signaling pathway. Derivation of the network was non-trivial due to variables appearing in complexes within the model and full details can be found in Supplementary Note 8. Three unconnected dummy nodes were incorporated in the model and node names were anonymized in the training data.
- Supplementary Figure 5: Balance of positives and negatives in the gold-standards for the network inference sub-challenge (SC1). (51 KB)
The gold standard for the experimental data task (SC1A) comprised sets of descendants of mTOR for each (cell line, stimulus) context, experimentally-determined using the held-out test data. Shown (left) are the number of positives and negatives for each context; that is, the number of phosphoproteins that are descendants of mTOR according to the test data (positives) and the number that are non-descendants of mTOR (negatives). For the in silico data task (SC1B), the gold-standard consisted of the data-generating network. Shown (right) are the number of edges in this network (positives) and the number of non-edges (negatives).
- Supplementary Figure 6: Comparison of AUROC with an alternative scoring metric, AUPR, for the network inference sub-challenge (SC1). (76 KB)
(a) Alternative team rankings were calculated by replacing AUROC with AUPR (area under the precision-recall curve) in the scoring procedure. The alternative rankings were compared with the original AUROC-based rankings for both the experimental data task (SC1A; left) and in silico data task (SC1B; right). (b) A further alternative ranking, combining both AUROC and AUPR, was obtained by ranking teams based on an average of final rank under AUROC and final rank under AUPR, and was compared with the original AUROC-based rankings.
- Supplementary Figure 7: Statistical significance of AUROC scores for the network inference sub-challenge experimental data task (SC1A). (65 KB)
For each (cell line, stimulus) context, a null distribution over AUROC was generated and used to calculate an FDR-adjusted P value for each team (Online Methods). (a) The number of significant (FDR < 0.05) AUROC scores obtained by each team across the 32 contexts (multiple testing correction performed within each context with respect to number of teams). Teams are ordered according to their final ranking in SC1A (based on mean rank score). (b) For each context, the number of teams (out of a total of 74) that obtained significant AUROC scores. For two regimes (BT549, NRG1) and (BT20, Insulin), no teams obtained a significant AUROC score. These two regimes were disregarded in the scoring process.
- Supplementary Figure 8: Crowdsourced analysis for the network inference sub-challenge in silico data task (SC1B). (35 KB)
(a) Aggregate submission networks were formed by integrating predicted networks across the top N teams (as given by final team rankings), with N varied between 1 (top performer only) and all teams (after removal of correlated submissions; Supplementary Note 10). Integration was done by averaging predicted edge weights (Online Methods). The blue line shows performance (AUROC) of the aggregate submission networks. Individual team scores are also depicted (red circles). (b) Predicted networks were integrated for subsets of N teams, selected at random. The blue line shows mean performance of the aggregate submission networks, calculated over 100 random subsets of teams (error bars indicate s.d.). Crowdsourced analysis for the experimental data network inference task is shown in Figure 3c,d.
- Supplementary Figure 9: Weighted combinations of two top performing approaches and aggregate prior network for the network inference sub-challenge experimental data task (SC1A). (62 KB)
An extension of Figure 4b to show three-way combinations of (i) PropheticGranger – top performer for the experimental data task when combined with a prior network (here, the method is used without the prior network); (ii) FunChisq – top performer for the in silico data task and most consistent performer across both data types; and (iii) an aggregate prior network formed by integrating prior networks used by participants (Online Methods). The three approaches were combined by taking weighted averages of predicted edge scores for each (cell line, stimulus) context and performance assessed using mean AUROC. For example, the best performance (mean AUROC = 0.82) was achieved by combining 20% PropheticGranger, 50% FunChisq and 30% aggregate prior network, and is highlighted with an “X”. See Supplementary Note 1 for full details of the PropheticGranger and FunChisq approaches.
- Supplementary Figure 10: Time-course prediction sub-challenge experimental data task (SC2A): phosphoproteins showing the largest changes under mTOR inhbition are predicted with least accuracy. (74 KB)
SC2A tasked participants with predicting phosphoprotein time-courses for each (cell line, stimulus) context under an unseen intervention (mTOR inhibition - mTORi). Submitted predictions were assessed against held-out test data obtained under mTORi. For each team, root mean square error (RMSE) scores were calculated for each (cell line, phosphoprotein) pair (see Supplementary Note 6). (a) Left: for each (cell line, phosphoprotein) pair, normalized RMSE1 for the top-ranked team (Team44) vs. absolute effect size. The effect size for a given (cell line, phosphoprotein) pair is a measure of the magnitude of abundance change under mTORi relative to DMSO control2. Note that this measure is based on the mTORi test data and is independent of team predictions. The strong positive correlation indicates that phosphoproteins showing little or no change under mTORi were predicted relatively well but phosphoproteins that showed large changes under mTORi were predicted badly. Right: examples of time-courses underlying the scatter plot (left). Shown are abundances of three phosphoproteins for cell line UACC812 under DMSO control and under mTORi, as predicted by Team44 and test data values. Note that normalized RMSE and effect size values are calculated across all stimuli, but only serum stimulus time-courses are shown here. (b) Scatter plots as in a for teams ranked 2 to 5 in SC2A. These results highlight the challenging nature of predicting protein abundance under unseen interventions but also point to a shortcoming of the RMSE score used here, namely that it does not sufficiently emphasize ability to predict proteins that change under intervention. For a future challenge, a modified metric that focuses on those proteins might therefore be useful.
1To ensure comparability across cell lines and phosphoproteins, each RMSE score was normalized by the standard deviation of the test data used in the RMSE calculation.
2Effect size is defined as the mean difference in phosphoprotein abundance between DMSO control and mTORi, normalized by the standard deviation of the differences. Means and standard deviations are calculated across all time points and stimuli for the given cell line.
- Supplementary Figure 11: Visualization sub-challenge (SC3) voting results and rankings. (43 KB)
14 teams made submissions to the visualization sub-challenge. HPN-DREAM challenge participants were asked to select and rank (from 1 to 3) their three favorite submissions. The remaining unranked submissions were then assigned a rank of 4. Thirty-six participants participated in the voting process and the number of votes of each rank type is shown (bar plot, left axis). Final team ranks were based on the mean rank across the 36 votes (green line, right axis).
- Supplementary Figure 12: Robustness of rankings for the network inference sub-challenge (SC1). (113 KB)
The test data was subsampled to assess robustness of rankings (Online Methods). Box plots show team ranks over 100 subsampling iterations, with 50% of the test data left out at each iteration. (a) Experimental data task - subsampling performed by removing 50% of phosphoproteins when assessing descendant sets for each (cell line, stimulus) context. (b) Experimental data task – subsampling performed by removing 50% of contexts from the scoring process. (c) In silico data task – subsampling performed by considering only 50% of edges/non-edges in the gold-standard network. For all box plots, the central line indicates the median, and the box edges denote the 25th and 75th percentiles. Whiskers extend to 1.5 times the interquartile range from the box hinge. Data points beyond the whiskers are regarded as outliers and are plotted individually.
- Supplementary Text and Figures (9,453 KB)
Supplementary Figures 1–12 and Supplementary Notes 1–11
- Supplementary Table 1 (19 KB)
List of antibodies.
- Supplementary Table 2 (41 KB)
Network inference sub-challenge (SC1) results, metadata, and inclusion of teams in Consortium and post-challenge analyses.
- Supplementary Table 3 (141 KB)
Comparison of aggregate submission networks with the aggregate prior network to identify novel, context-specific edges.
- Supplementary Table 4 (19 KB)
Time-course prediction sub-challenge (SC2) results, metadata, and inclusion of teams in Consortium.
- Supplementary Table 5 (11 KB)
(Cell line, phosphoprotein) pairs disregarded in the scoring procedure for the time-course prediction sub-challenge experimental data task (SC2A).
- Supplementary Table 6 (12 KB)
(Test inhibitor, predicted node) pairs disregarded in the scoring procedure for the time-course prediction sub-challenge in silico data task (SC2B).