Inferring causal molecular networks: empirical assessment through a community-based effort

Journal name:
Nature Methods
Volume:
13,
Pages:
310–318
Year published:
DOI:
doi:10.1038/nmeth.3773
Received
Accepted
Published online

Abstract

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

At a glance

Figures

  1. Causal networks.
    Figure 1: Causal networks.

    (a) A directed edge denotes that inhibition of the parent node A can change the abundance of the child node B. (b) Causal edges, as used here, may represent direct effects or indirect effects that occur via unmeasured intermediate nodes. If node A causally influences node B via measured node C, the causal network should contain edges from A to C and from C to B, but not from A to B (top). However, if node C is not measured (and is not part of the network), the causal network should contain an edge from A to B (bottom). Note that in both cases inhibition of node A will lead to a change in node B. (c) Causal edges may depend on biological context; for example, a causal edge from A to B appears in context 1, but not in context 2 (lines in graphs are as defined in a). (d) Correlation and causation. Nodes A and B are correlated owing to regulation by the same node (C), but in this example no sequence of mechanistic events links A to B, and thus inhibition of A does not change the abundance of B (lines in bottom right graph are as defined in a). Therefore, despite the correlation, there is no causal edge from A to B.

  2. The HPN-DREAM network inference challenge: overview of experimental data tasks and causal assessment strategy.
    Figure 2: The HPN-DREAM network inference challenge: overview of experimental data tasks and causal assessment strategy.

    (a) Protein data were obtained from four cancer cell lines under eight stimuli (described in ref. 31). For each of the 32 resulting contexts, participants were provided with training data comprising time courses for ~45 phosphoproteins under three different kinase inhibitors and a control (DMSO). For the sub-challenge 1 experimental data task (SC1A), participants were asked to infer causal signaling networks specific to each context. In SC2A, the aim was to predict context-specific phosphoprotein time courses. In both cases, submissions were assessed using held-out, context-specific test data that were obtained under an unseen intervention (inhibition of the kinase mTOR). Each sub-challenge also included a companion in silico data task (SC1B and SC2B, respectively; described in the text, Online Methods and Supplementary Fig. 1). Abund., abundance; TP, true positives; FP, false positives. (b) Networks submitted for SC1A were assessed causally in terms of agreement with the interventional test data. For each context, the set of nodes that changed under mTOR inhibition was identified (gold-standard causal descendants of mTOR; described in the text and Online Methods). In the example shown, node X is a descendant of mTOR, whereas node Y is not. (c) Predicted descendants of mTOR from submitted context-specific networks were compared with their experimentally determined gold-standard counterparts. This gave true and false positive counts and a (context-specific) AUROC. (d) In each context, teams were ranked by AUROC score, and mean rank across contexts gave the final rankings.

  3. Network inference sub-challenge (SC1) results.
    Figure 3: Network inference sub-challenge (SC1) results.

    (a) AUROC scores in each of the 32 (cell line, stimulus) contexts for the 74 teams that submitted networks for the experimental data task. (b) Scores in experimental and in silico data tasks. Each square represents a team. Red borders around squares indicate that a different method was used in each task. Numbers adjacent to squares indicate ranks for the top ten teams under a combined score (three teams ranked third). (c,d) Results of crowdsourcing for the experimental data task. Aggregate networks were formed by combining, for each context, networks from top scoring (c) or randomly selected (d) teams (Online Methods). Dashed lines indicate aggregations of all submissions. Results in d are mean values over 100 iterations of random selection (error bars indicate ±s.d.). (e,f) Performance by method type for the experimental (e) and in silico (f) data tasks. The final rank is shown above each bar, and the gray lines indicate the mean performance of random predictions. ODE, ordinary differential equation.

  4. Role of pre-existing biological knowledge in the experimental data network inference task (SC1A).
    Figure 4: Role of pre-existing biological knowledge in the experimental data network inference task (SC1A).

    (a) Box plots showing mean AUROC scores for teams that either did or did not use a prior network. P value calculated via Wilcoxon rank-sum test (n = 18). (b) Performance of aggregate prior network when combined with networks inferred by PropheticGranger (top performer in SC1A when combined with a network prior) or FunChisq (top performer in SC1B). The blue line indicates aggregate prior combined with randomly generated networks (mean of 30 random networks; shading indicates ±s.d.). The dashed line shows the mean AUROC score achieved by the top-performing team in SC1A. Error bars denote ±s.e.m. (c) Performance of aggregate submission network and aggregate prior network in each context. Top, performance by context. Box plots over AUROC scores for the top 25 performers for each context, shown for comparison. Bottom, receiver operating characteristic curves for two contexts that showed performance differences between aggregate submission and prior. For all box plots, line within the box indicates the median, and the box edges denote the 25th and 75th percentiles. Whiskers extend to 1.5 times the interquartile range from the box hinge. Individual data points are also shown.

  5. Aggregate submission networks for the experimental data network inference task (SC1A).
    Figure 5: Aggregate submission networks for the experimental data network inference task (SC1A).

    (a) The aggregate submission network for cell line MCF7 under HGF stimulation. Line thickness corresponds to edge weight (number of edges shown set to equal number of nodes). To determine which edges were present and not present in the aggregate prior network, we placed a threshold of 0.1 on edge weights. Green and blue nodes represent descendants of mTOR in the network shown (Fig. 2b,c and Supplementary Fig. 2). The network was generated using Cytoscape40. (b) Principal component analysis applied to edge scores for the 32 context-specific aggregate submission networks (Online Methods).

  6. The HPN-DREAM network inference challenge: overview of in silico data tasks.
    Supplementary Fig. 1: The HPN-DREAM network inference challenge: overview of in silico data tasks.

    Data were generated from a nonlinear dynamical model of the ErbB signaling pathway (Chen et al., 2009). Training data consisted of time-courses for 20 network nodes under three inhibitors targeting specific nodes, or no inhibitor, and under two ligand stimuli, applied individually and in combination at two concentrations. In total there were 20 different (inhibitor, stimulus) conditions as shown (top right). Time-courses comprised 11 time points and three technical replicates were provided. Node names were anonymized to prevent use of biological prior information. The sub-challenge 1 in silico data task (SC1B) asked participants to infer a single directed, weighted network using the training data. The aim of the sub-challenge 2 in silico data task (SC2B) was to predict stimulus-specific time-courses under unseen interventions. For SC1B, submissions were assessed against a gold-standard network extracted from the data-generating model, with agreement quantified using AUROC score. For SC2B, predicted time-courses were assessed using held-out test data obtained under in silico inhibition of each network node in turn, with prediction accuracy quantified using root mean square error (RMSE). See Online Methods for further details of the in silico data tasks.

    Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).

  7. Context-specific /`gold-standard/' causal descendant sets for the network inference sub-challenge experimental data task (SC1A).
    Supplementary Fig. 2: Context-specific ‘gold-standard’ causal descendant sets for the network inference sub-challenge experimental data task (SC1A).

    Context-specific networks submitted to SC1A were assessed using held-out test data, obtained under inhibition of mTOR. Each column in the heatmap indicates, for a given (cell line, stimulus) context c, the phosphoproteins that showed salient changes under mTOR inhibition relative to DMSO control (black cells) and those that did not (white cells). Such changes were determined from the test data using a procedure centered around a paired t-test. Phosphoproteins that show salient changes can be regarded as descendants of mTOR in the underlying causal signaling network. Columns therefore represent context-specific experimentally-determined sets of causal descendants of mTOR, DcGS, and were used as a ‘gold-standard’ to assess inferred context-specific networks. Further details regarding the determination of the gold-standard descendant sets and the scoring procedure can be found in Online Methods. Missing data is indicated by gray cells (some phosphoprotein antibodies were only present in the (training and test) data for a subset of cell lines). Based on a figure in Hill, Nesser et al. (2016).

    Hill, S.M., Nesser, N.K. et al. Context-specificity in causal signaling networks revealed by phosphoprotein profiling. bioRxiv doi:10.1101/039636 (2016).

  8. Network inference sub-challenge (SC1) final team scores and rankings.
    Supplementary Fig. 3: Network inference sub-challenge (SC1) final team scores and rankings.

    (a) Mean rank scores for the 74 teams that participated in the experimental data task (SC1A). Mean rank scores were used to obtain final team rankings. For the 40 teams that provided information regarding their approach, bar color indicates method type (see also Fig. 3e, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate teams with statistically significant AUROC scores (FDR < 0.05) in at least 50% of (cell line, stimulus) contexts (2 stars) or at least 25% of contexts (1 star) (multiple testing correction performed within each context with respect to number of teams). (b) AUROC scores for the 65 teams that participated in the in silico data task (SC1B). AUROC scores were used to obtain final team rankings. As in a, color indicates method type (see also Fig. 3f, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate statistically significant AUROC scores (FDR < 0.05). (c) Comparison of mean rank and mean AUROC scores for SC1A. (d) Final ranks for SC1A (dashed blue line) and SC1B (dotted green line) were averaged to obtain a combined score (solid red line) for the 59 teams that participated in both tasks. Teams ordered by combined score (see “SC1A/B combined final rank” column in Supplementary Table 2). See Online Methods for full details of scoring for SC1.

  9. Gold-standard causal network for the network inference sub-challenge in silico data task (SC1B).
    Supplementary Fig. 4: Gold-standard causal network for the network inference sub-challenge in silico data task (SC1B).

    The gold-standard network, used to assess networks submitted to SC1B, was obtained from a data-generating dynamical model of the ErbB signaling pathway. Derivation of the network was non-trivial due to variables appearing in complexes within the model and full details can be found in Supplementary Note 8. Three unconnected dummy nodes were incorporated in the model and node names were anonymized in the training data.

  10. Balance of positives and negatives in the gold-standards for the network inference sub-challenge (SC1).
    Supplementary Fig. 5: Balance of positives and negatives in the gold-standards for the network inference sub-challenge (SC1).

    The gold standard for the experimental data task (SC1A) comprised sets of descendants of mTOR for each (cell line, stimulus) context, experimentally-determined using the held-out test data. Shown (left) are the number of positives and negatives for each context; that is, the number of phosphoproteins that are descendants of mTOR according to the test data (positives) and the number that are non-descendants of mTOR (negatives). For the in silico data task (SC1B), the gold-standard consisted of the data-generating network. Shown (right) are the number of edges in this network (positives) and the number of non-edges (negatives).

  11. Comparison of AUROC with an alternative scoring metric, AUPR, for the network inference sub-challenge (SC1).
    Supplementary Fig. 6: Comparison of AUROC with an alternative scoring metric, AUPR, for the network inference sub-challenge (SC1).

    (a) Alternative team rankings were calculated by replacing AUROC with AUPR (area under the precision-recall curve) in the scoring procedure. The alternative rankings were compared with the original AUROC-based rankings for both the experimental data task (SC1A; left) and in silico data task (SC1B; right). (b) A further alternative ranking, combining both AUROC and AUPR, was obtained by ranking teams based on an average of final rank under AUROC and final rank under AUPR, and was compared with the original AUROC-based rankings.

  12. Statistical significance of AUROC scores for the network inference sub-challenge experimental data task (SC1A).
    Supplementary Fig. 7: Statistical significance of AUROC scores for the network inference sub-challenge experimental data task (SC1A).

    For each (cell line, stimulus) context, a null distribution over AUROC was generated and used to calculate an FDR-adjusted P value for each team (Online Methods). (a) The number of significant (FDR < 0.05) AUROC scores obtained by each team across the 32 contexts (multiple testing correction performed within each context with respect to number of teams). Teams are ordered according to their final ranking in SC1A (based on mean rank score). (b) For each context, the number of teams (out of a total of 74) that obtained significant AUROC scores. For two regimes (BT549, NRG1) and (BT20, Insulin), no teams obtained a significant AUROC score. These two regimes were disregarded in the scoring process.

  13. Crowdsourced analysis for the network inference sub-challenge in silico data task (SC1B).
    Supplementary Fig. 8: Crowdsourced analysis for the network inference sub-challenge in silico data task (SC1B).

    (a) Aggregate submission networks were formed by integrating predicted networks across the top N teams (as given by final team rankings), with N varied between 1 (top performer only) and all teams (after removal of correlated submissions; Supplementary Note 10). Integration was done by averaging predicted edge weights (Online Methods). The blue line shows performance (AUROC) of the aggregate submission networks. Individual team scores are also depicted (red circles). (b) Predicted networks were integrated for subsets of N teams, selected at random. The blue line shows mean performance of the aggregate submission networks, calculated over 100 random subsets of teams (error bars indicate s.d.). Crowdsourced analysis for the experimental data network inference task is shown in Figure 3c,d.

  14. Weighted combinations of two top performing approaches and aggregate prior network for the network inference sub-challenge experimental data task (SC1A).
    Supplementary Fig. 9: Weighted combinations of two top performing approaches and aggregate prior network for the network inference sub-challenge experimental data task (SC1A).

    An extension of Figure 4b to show three-way combinations of (i) PropheticGranger – top performer for the experimental data task when combined with a prior network (here, the method is used without the prior network); (ii) FunChisq – top performer for the in silico data task and most consistent performer across both data types; and (iii) an aggregate prior network formed by integrating prior networks used by participants (Online Methods). The three approaches were combined by taking weighted averages of predicted edge scores for each (cell line, stimulus) context and performance assessed using mean AUROC. For example, the best performance (mean AUROC = 0.82) was achieved by combining 20% PropheticGranger, 50% FunChisq and 30% aggregate prior network, and is highlighted with an “X”. See Supplementary Note 1 for full details of the PropheticGranger and FunChisq approaches.

  15. Time-course prediction sub-challenge experimental data task (SC2A): phosphoproteins showing the largest changes under mTOR inhbition are predicted with least accuracy.
    Supplementary Fig. 10: Time-course prediction sub-challenge experimental data task (SC2A): phosphoproteins showing the largest changes under mTOR inhbition are predicted with least accuracy.

    SC2A tasked participants with predicting phosphoprotein time-courses for each (cell line, stimulus) context under an unseen intervention (mTOR inhibition - mTORi). Submitted predictions were assessed against held-out test data obtained under mTORi. For each team, root mean square error (RMSE) scores were calculated for each (cell line, phosphoprotein) pair (see Supplementary Note 6). (a) Left: for each (cell line, phosphoprotein) pair, normalized RMSE1 for the top-ranked team (Team44) vs. absolute effect size. The effect size for a given (cell line, phosphoprotein) pair is a measure of the magnitude of abundance change under mTORi relative to DMSO control2. Note that this measure is based on the mTORi test data and is independent of team predictions. The strong positive correlation indicates that phosphoproteins showing little or no change under mTORi were predicted relatively well but phosphoproteins that showed large changes under mTORi were predicted badly. Right: examples of time-courses underlying the scatter plot (left). Shown are abundances of three phosphoproteins for cell line UACC812 under DMSO control and under mTORi, as predicted by Team44 and test data values. Note that normalized RMSE and effect size values are calculated across all stimuli, but only serum stimulus time-courses are shown here. (b) Scatter plots as in a for teams ranked 2 to 5 in SC2A. These results highlight the challenging nature of predicting protein abundance under unseen interventions but also point to a shortcoming of the RMSE score used here, namely that it does not sufficiently emphasize ability to predict proteins that change under intervention. For a future challenge, a modified metric that focuses on those proteins might therefore be useful.

    1To ensure comparability across cell lines and phosphoproteins, each RMSE score was normalized by the standard deviation of the test data used in the RMSE calculation.
    2Effect size is defined as the mean difference in phosphoprotein abundance between DMSO control and mTORi, normalized by the standard deviation of the differences. Means and standard deviations are calculated across all time points and stimuli for the given cell line.

  16. Visualization sub-challenge (SC3) voting results and rankings.
    Supplementary Fig. 11: Visualization sub-challenge (SC3) voting results and rankings.

    14 teams made submissions to the visualization sub-challenge. HPN-DREAM challenge participants were asked to select and rank (from 1 to 3) their three favorite submissions. The remaining unranked submissions were then assigned a rank of 4. Thirty-six participants participated in the voting process and the number of votes of each rank type is shown (bar plot, left axis). Final team ranks were based on the mean rank across the 36 votes (green line, right axis).

  17. Robustness of rankings for the network inference sub-challenge (SC1).
    Supplementary Fig. 12: Robustness of rankings for the network inference sub-challenge (SC1).

    The test data was subsampled to assess robustness of rankings (Online Methods). Box plots show team ranks over 100 subsampling iterations, with 50% of the test data left out at each iteration. (a) Experimental data task - subsampling performed by removing 50% of phosphoproteins when assessing descendant sets for each (cell line, stimulus) context. (b) Experimental data task – subsampling performed by removing 50% of contexts from the scoring process. (c) In silico data task – subsampling performed by considering only 50% of edges/non-edges in the gold-standard network. For all box plots, the central line indicates the median, and the box edges denote the 25th and 75th percentiles. Whiskers extend to 1.5 times the interquartile range from the box hinge. Data points beyond the whiskers are regarded as outliers and are plotted individually.

References

  1. Bansal, M., Belcastro, V., Ambesi-Impiombato, A. & di Bernardo, D. How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 78 (2007).
  2. Markowetz, F. & Spang, R. Inferring cellular networks—a review. BMC Bioinformatics 8, S5 (2007).
  3. Hecker, M., Lambeck, S., Toepfer, S., van Someren, E. & Guthke, R. Gene regulatory network inference: data integration in dynamic models—a review. Biosystems 96, 86103 (2009).
  4. De Smet, R. & Marchal, K. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717729 (2010).
  5. Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107, 62866291 (2010).
  6. Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J. & Ragan, M.A. Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief. Bioinform. 15, 195211 (2014).
  7. Ideker, T. & Krogan, N.J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).
  8. de la Fuente, A. From 'differential expression' to 'differential networking'—identification of dysfunctional regulatory networks in diseases. Trends Genet. 26, 326333 (2010).
  9. Hill, S.M. et al. Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28, 28042810 (2012).
  10. Saez-Rodriguez, J. et al. Comparing signaling networks between normal and transformed hepatocytes using discrete logical models. Cancer Res. 71, 54005411 (2011).
  11. Molinelli, E.J. et al. Perturbation biology: inferring signaling networks in cellular systems. PLoS Comput. Biol. 9, e1003290 (2013).
  12. Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).
  13. Akbani, R. et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014).
  14. Eduati, F., De Las Rivas, J., Di Camillo, B., Toffolo, G. & Saez-Rodriguez, J. Integrating literature-constrained and data-driven inference of signalling networks. Bioinformatics 28, 23112317 (2012).
  15. Pearl, J. <i>Causality: Models, Reasoning, and Inference</i> 2nd edn. (Cambridge Univ. Press, 2009).
  16. Freedman, D. & Humphreys, P. Are there algorithms that discover causal structure? Synthese 121, 2954 (1999).
  17. Husmeier, D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 19, 22712282 (2003).
  18. Friedman, N., Linial, M., Nachman, I. & Pe'er, D. Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601620 (2000).
  19. Sachs, K., Perez, O. & Pe'er, D. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523529 (2005).
  20. Spirtes, P., Glymour, C.N. & Scheines, R. Causation, Prediction, and Search 2nd edn. (MIT Press, 2000).
  21. Cantone, I. et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172181 (2009).
  22. Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796804 (2012).
  23. Stolovitzky, G., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Ann. NY Acad. Sci. 1115, 122 (2007).
  24. Stolovitzky, G., Prill, R.J. & Califano, A. Lessons from the DREAM2 challenges. Ann. NY Acad. Sci. 1158, 159195 (2009).
  25. Prill, R.J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).
  26. Prill, R.J., Saez-Rodriguez, J., Alexopoulos, L.G., Sorger, P.K. & Stolovitzky, G. Crowdsourcing network inference: the DREAM predictive signaling network challenge. Sci. Signal. 4, mr7 (2011).
  27. Meyer, P. et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014).
  28. Tibes, R. et al. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol. Cancer Ther. 5, 25122521 (2006).
  29. Mertins, P. et al. Ischemia in tumors induces early and sustained phosphorylation changes in stress kinase pathways but does not affect global protein levels. Mol. Cell. Proteomics 13, 16901704 (2014).
  30. Derry, J.M.J. et al. Developing predictive molecular maps of human disease through community-based modeling. Nat. Genet. 44, 127130 (2012).
  31. Hill, S.M. et al. Context-specificity in causal signaling networks revealed by phosphoprotein profiling. bioRxiv doi:10.1101/039636 (2016).
  32. Davis, J. & Goadrich, M. The relationship between Precision-Recall and ROC curves. in Proc. 23rd International Conference on Machine Learning 233240 (ACM, 2006).
  33. Costello, J.C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 12021212 (2014).
  34. Margolin, A.A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
  35. Cerami, E.G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685D690 (2011).
  36. Wang, H. & Song, M. Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3, 2933 (2011).
  37. Chresta, C.M. et al. AZD8055 is a potent, selective, and orally bioavailable ATP-competitive mammalian target of rapamycin kinase inhibitor with in vitro and in vivo antitumor activity. Cancer Res. 70, 288298 (2010).
  38. Maathuis, M.H., Colombo, D., Kalisch, M. & Bühlmann, P. Predicting causal effects in large-scale systems from observational data. Nat. Methods 7, 247248 (2010).
  39. Olsen, C. et al. Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 103, 329336 (2014).
  40. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 24982504 (2003).
  41. Neve, R.M. et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515527 (2006).
  42. Garnett, M.J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570575 (2012).
  43. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603607 (2012).
  44. Hennessy, B.T. et al. A technical assessment of the utility of reverse phase protein arrays for the study of the functional proteome in non-microdissected human breast cancers. Clin. Proteomics 6, 129151 (2010).
  45. Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933940 (2015).
  46. Guitart-Pla, O., Kustagi, M., Rügheimer, F., Califano, A. & Schwikowski, B. The Cyni framework for network inference in Cytoscape. Bioinformatics 31, 14991501 (2015).
  47. Benjamini, Y., Krieger, A.M. & Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93, 491507 (2006).
  48. Cokelaer, T. et al. DREAMTools: a Python package for scoring collaborative challenges. F1000Research 4, 1030 (2015).

Download references

Author information

  1. A full list of authors and affiliations appears at the end of the paper.

    • The HPN-DREAM Consortium
  2. Present addresses: Bioinformatics and Biostatistics Hub, C3BI, Institut Pasteur, Paris, France (T.C.); Amyris Inc., Emeryville, California, USA (Y.Z.); SimQuest Inc., Boston, Massachusetts, USA (H.W.); Department of Electrical Engineering and Information Technology, Technische Universitaet Darmstadt, Darmstadt, Germany (H.K.); and German Centre for Neurodegenerative Diseases, Bonn, Germany (S.M.).

    • Thomas Cokelaer,
    • Yang Zhang,
    • Haizhou Wang,
    • Sach Mukherjee,
    • Haizhou Wang,
    • Yang Zhang,
    • Heinz Koeppl &
    • Sach Mukherjee
  3. These authors contributed equally to this work.

    • Steven M Hill &
    • Laura M Heiser

Affiliations

  1. MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK.

    • Steven M Hill &
    • Sach Mukherjee
  2. Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA.

    • Laura M Heiser &
    • Joe W Gray
  3. Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, Oregon, USA.

    • Laura M Heiser &
    • Joe W Gray
  4. Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, USA.

    • Laura M Heiser &
    • Joe W Gray
  5. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.

    • Thomas Cokelaer &
    • Julio Saez-Rodriguez
  6. Automatic Control Laboratory, ETH Zurich, Zurich, Switzerland.

    • Michael Unger &
    • Heinz Koeppl
  7. Institute of Biochemistry, ETH Zurich, Zurich, Switzerland.

    • Michael Unger &
    • Heinz Koeppl
  8. Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, Oregon, USA.

    • Nicole K Nesser &
    • Paul T Spellman
  9. Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA.

    • Daniel E Carlin,
    • Artem Sokolov,
    • Evan O Paull,
    • Chris K Wong,
    • Kiley Graim,
    • Adrian Bivol &
    • Joshua M Stuart
  10. Department of Computer Science, New Mexico State University, Las Cruces, New Mexico, USA.

    • Yang Zhang,
    • Haizhou Wang &
    • Mingzhou Song
  11. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.

    • Fan Zhu &
    • Yuanfang Guan
  12. Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, Maryland, USA.

    • Bahman Afsari,
    • Ludmila V Danilova,
    • Alexander V Favorov,
    • Wai Shing Lee &
    • Elana J Fertig
  13. Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

    • Ludmila V Danilova &
    • Alexander V Favorov
  14. Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow, Russia.

    • Alexander V Favorov
  15. Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, North Carolina, USA.

    • Dane Taylor
  16. Department of Mathematics, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Dane Taylor
  17. Department of Bioengineering, Rice University, Houston, Texas, USA.

    • Chenyue W Hu,
    • Byron L Long,
    • David P Noren,
    • Alexander J Bisberg &
    • Amina A Qutub
  18. Department of Systems Biology, MD Anderson Cancer Center, Houston, Texas, USA.

    • Gordon B Mills
  19. Sage Bionetworks, Seattle, Washington, USA.

    • Stephen Friend,
    • Justin Guinney,
    • Jay Hodgson,
    • Bruce Hoff,
    • Michael Kellen &
    • Thea Norman
  20. Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA.

    • Yuanfang Guan
  21. Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA.

    • Yuanfang Guan
  22. IBM Translational Systems Biology and Nanobiotechnology, Yorktown Heights, New York, USA.

    • Gustavo Stolovitzky
  23. RWTH–Aachen University Hospital, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Aachen, Germany.

    • Julio Saez-Rodriguez
  24. School of Clinical Medicine, University of Cambridge, Cambridge, UK.

    • Sach Mukherjee
  25. German Centre for Neurodegenerative Diseases (DZNE), Bonn, Germany.

    • Sach Mukherjee
  26. School of Electrical Engineering and Computer Science, Russ College of Engineering and Technology, Ohio University, Athens, Ohio, USA.

    • Rami Al-Ouran,
    • Razvan Bunescu,
    • Yichao Li,
    • Xiaoyu Liang &
    • Lonnie Welch
  27. Structural Bioinformatics Group (GRIB/IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.

    • Bernat Anton,
    • Jaume Bonet,
    • Javier Garcia-Garcia,
    • Baldo Oliva,
    • Joan Planas-Iglesias &
    • Daniel Poglayen
  28. Department of Computer Science, School of Engineering, Virginia Commonwealth University, Richmond, Virginia, USA.

    • Tomasz Arodz,
    • Xi Gao &
    • Janusz Slawek
  29. Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.

    • Omid Askari Sichani,
    • Seyed-Mohammad-Hadi Daneshmand &
    • Mahdi Jalili
  30. Chemical & Biological Engineering, Northwestern University, Evanston, Illinois, USA.

    • Neda Bagheri,
    • Mark F Ciaccio &
    • Albert Y Xue
  31. Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, Texas, USA.

    • Noah Berlow,
    • Saad Haider,
    • Ranadip Pal &
    • Qian Wan
  32. Department of Bioinformatics—BiGCaT, Maastricht University, Maastricht, the Netherlands.

    • Anwesha Bohler,
    • Gungor Budak,
    • Chris Evelo &
    • Martina Kutmon
  33. Department of Biology, Center for Genomics & Systems Biology, New York University, New York, New York, USA.

    • Richard Bonneau,
    • Christoph Hafemeister &
    • Christian Lorenz Müller
  34. Courant Institute of Mathematical Sciences, New York University, New York, New York, USA.

    • Richard Bonneau &
    • Christian Lorenz Müller
  35. Simons Center for Data Analysis, Simons Foundation, New York, New York, USA.

    • Richard Bonneau
  36. Department of Physics, Texas Tech University, Lubbock, Texas, USA.

    • Mehmet Caglar
  37. Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

    • Binghuang Cai,
    • Chunhui Cai,
    • Lujia Chen,
    • Gregory Cooper,
    • Joyeeta Dutta-Moscato,
    • Xia Jiang,
    • Songjian Lu &
    • Xinghua Lu
  38. Department of Information Engineering, University of Padova, Padova, Italy.

    • Azzurra Carlon,
    • Barbara Di Camillo,
    • Francesca Finotello,
    • Alberto Giaretta,
    • Marco Manfrini,
    • Francesco Sambo,
    • Tiziana Sanavia,
    • Gianna Maria Toffolo &
    • Emanuele Trifoglio
  39. Department of Medicine, Baylor College of Medicine, Houston, Texas, USA.

    • Chad J Creighton
  40. Division of Biostatistics, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas, USA.

    • Chad J Creighton
  41. Leibniz Institute for Farm Animal Biology, Institute of Genetics and Biometry, Dummerstorf, Germany.

    • Alberto de la Fuente &
    • Sonja Strunz
  42. Department of Physics, Columbia University, New York, New York, USA.

    • Kevin Emmett
  43. Biomedical Engineering, Northwestern University, Evanston, Illinois, USA.

    • Mohammad-Kasim H Fassia
  44. Interdepartmental Biological Sciences, Northwestern University, Evanston, Illinois, USA.

    • Justin D Finkle &
    • Jia J Wu
  45. Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.

    • Jean Gao &
    • Mingon Kang
  46. Research Department, The Systems Biology Institute, Tokyo, Japan.

    • Samik Ghosh,
    • Takeshi Hase,
    • Kaito Kikuchi,
    • Hiroaki Kitano &
    • Ryota Yamanaka
  47. Department of Modeling Biological Processes, Center for Organismal Studies Heidelberg, BioQuant (BQ0018), University of Heidelberg, Heidelberg, Germany.

    • Ruth Großeholz,
    • Oliver Hahn &
    • Michael Zengerling
  48. Center for Biomedical Informatics & Information Technology, National Cancer Institute, Bethesda, Maryland, USA.

    • Chih Hao Hsu,
    • Ying Hu,
    • George Komatsoulis,
    • Daoud Meerzaman &
    • Chunhua Yan
  49. BIOSS Centre for Biological Signalling Studies, University of Freiburg, Freiburg, Germany.

    • Xun Huang &
    • Zhike Zi
  50. Department of Functional Genomics, Interfaculty Institute for Genetics and Functional Genomics, Ernst-Moritz-Arndt University Greifswald, Greifswald, Germany.

    • Tim Kacprowski
  51. Medical Faculty Carl Gustav Carus, Institute for Medical Informatics and Biometry, Technische Universität Dresden, Dresden, Germany.

    • Lars Kaderali,
    • Bettina Knapp &
    • Marta R A Matos
  52. Institute for Bioinformatics, University Medicine Greifswald, Greifswald, Germany.

    • Lars Kaderali
  53. Department of Medicine, Solna, Unit of Computational Medicine, Science for Life Laboratory (SciLifeLab), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.

    • Venkateshan Kannan,
    • Jesper Tegnér &
    • Hector Zenil
  54. Department of Computer Science, University of Texas–Pan American, Edinburg, Texas, USA.

    • Dong-Chul Kim
  55. Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany.

    • Bettina Knapp
  56. QIAGEN, Redwood City, California, USA.

    • Andreas Krämer
  57. Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland.

    • Miron Bartosz Kursa
  58. National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

    • Zhaoqi Liu,
    • Wenwen Min &
    • Shihua Zhang
  59. Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA.

    • Yu Liu
  60. Division of Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.

    • Richard E Neapolitan
  61. Molecular and Cellular Imaging Center–Columbus, Ohio State University, Columbus, Ohio, USA.

    • Stephen Obol Opiyo
  62. Department of Mathematics and Computer Science, Freie Universitât Berlin, Berlin, Germany.

    • Aljoscha Palinkas,
    • Adam Streck &
    • Kirste Thobe
  63. Department of Stem Cells and Developmental Biology, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran.

    • Ali Sharifi-Zarchi
  64. Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.

    • Sakellarios Zairis

Consortia

  1. The HPN-DREAM Consortium

    • Bahman Afsari,
    • Rami Al-Ouran,
    • Bernat Anton,
    • Tomasz Arodz,
    • Omid Askari Sichani,
    • Neda Bagheri,
    • Noah Berlow,
    • Alexander J Bisberg,
    • Adrian Bivol,
    • Anwesha Bohler,
    • Jaume Bonet,
    • Richard Bonneau,
    • Gungor Budak,
    • Razvan Bunescu,
    • Mehmet Caglar,
    • Binghuang Cai,
    • Chunhui Cai,
    • Daniel E Carlin,
    • Azzurra Carlon,
    • Lujia Chen,
    • Mark F Ciaccio,
    • Thomas Cokelaer,
    • Gregory Cooper,
    • Chad J Creighton,
    • Seyed-Mohammad-Hadi Daneshmand,
    • Alberto de la Fuente,
    • Barbara Di Camillo,
    • Ludmila V Danilova,
    • Joyeeta Dutta-Moscato,
    • Kevin Emmett,
    • Chris Evelo,
    • Mohammad-Kasim H Fassia,
    • Alexander V Favorov,
    • Elana J Fertig,
    • Justin D Finkle,
    • Francesca Finotello,
    • Stephen Friend,
    • Xi Gao,
    • Jean Gao,
    • Javier Garcia-Garcia,
    • Samik Ghosh,
    • Alberto Giaretta,
    • Kiley Graim,
    • Joe W Gray,
    • Ruth Großeholz,
    • Yuanfang Guan,
    • Justin Guinney,
    • Christoph Hafemeister,
    • Oliver Hahn,
    • Saad Haider,
    • Takeshi Hase,
    • Laura M Heiser,
    • Steven M Hill,
    • Jay Hodgson,
    • Bruce Hoff,
    • Chih Hao Hsu,
    • Chenyue W Hu,
    • Ying Hu,
    • Xun Huang,
    • Mahdi Jalili,
    • Xia Jiang,
    • Tim Kacprowski,
    • Lars Kaderali,
    • Mingon Kang,
    • Venkateshan Kannan,
    • Michael Kellen,
    • Kaito Kikuchi,
    • Dong-Chul Kim,
    • Hiroaki Kitano,
    • Bettina Knapp,
    • George Komatsoulis,
    • Heinz Koeppl,
    • Andreas Krämer,
    • Miron Bartosz Kursa,
    • Martina Kutmon,
    • Wai Shing Lee,
    • Yichao Li,
    • Xiaoyu Liang,
    • Zhaoqi Liu,
    • Yu Liu,
    • Byron L Long,
    • Songjian Lu,
    • Xinghua Lu,
    • Marco Manfrini,
    • Marta R A Matos,
    • Daoud Meerzaman,
    • Gordon B Mills,
    • Wenwen Min,
    • Sach Mukherjee,
    • Christian Lorenz Müller,
    • Richard E Neapolitan,
    • Nicole K Nesser,
    • David P Noren,
    • Thea Norman,
    • Baldo Oliva,
    • Stephen Obol Opiyo,
    • Ranadip Pal,
    • Aljoscha Palinkas,
    • Evan O Paull,
    • Joan Planas-Iglesias,
    • Daniel Poglayen,
    • Amina A Qutub,
    • Julio Saez-Rodriguez,
    • Francesco Sambo,
    • Tiziana Sanavia,
    • Ali Sharifi-Zarchi,
    • Janusz Slawek,
    • Artem Sokolov,
    • Mingzhou Song,
    • Paul T Spellman,
    • Adam Streck,
    • Gustavo Stolovitzky,
    • Sonja Strunz,
    • Joshua M Stuart,
    • Dane Taylor,
    • Jesper Tegnér,
    • Kirste Thobe,
    • Gianna Maria Toffolo,
    • Emanuele Trifoglio,
    • Michael Unger,
    • Qian Wan,
    • Haizhou Wang,
    • Lonnie Welch,
    • Chris K Wong,
    • Jia J Wu,
    • Albert Y Xue,
    • Ryota Yamanaka,
    • Chunhua Yan,
    • Sakellarios Zairis,
    • Michael Zengerling,
    • Hector Zenil,
    • Shihua Zhang,
    • Yang Zhang,
    • Fan Zhu &
    • Zhike Zi

Contributions

S.M.H., L.M.H., T.C., M.U., J.W.G., P.T.S., H.K., G.S., J.S.-R. and S.M. designed the challenge. J.W.G., P.T.S., N.K.N., G.B.M. and S.M. provided experimental data for use in the challenge. M.U. and H.K. provided data for the in silico data task. M.K., T.N. and S.F. developed and implemented the Synapse platform used to facilitate the challenge. S.M.H., L.M.H. and T.C. performed analyses of challenge data. S.M.H., L.M.H., T.C., M.U., H.K., G.S., J.S.-R. and S.M. interpreted the results of the challenge. D.E.C. and Y.Z. performed analyses to compare top-performing approaches submitted for the network inference sub-challenge. D.E.C., A.S., E.O.P., C.K.W., K.G., A.B. and J.M.S. designed the top-performing approach in the experimental data network inference task. Y.Z., H.W. and M.S. designed the approach that performed best in the in silico data network inference task and was the highest ranked across both experimental and in silico data network inference tasks. F.Z. and Y.G. developed an algorithm that was a top performer in the experimental data time-course prediction task and was also the highest ranked across both experimental and in silico data time-course prediction tasks. B.A., L.V.D., A.V.F., W.S.L., D.T. and E.J.F. were members of one of the top-performing teams in the experimental data time-course prediction task. C.W.H., B.L.L., D.P.N., A.J.B. and A.A.Q. designed the Biowheel visualization tool. The HPN-DREAM Consortium provided predictions and descriptions of the algorithms. S.M.H., L.M.H., T.C., M.U., D.E.C., Y.Z., M.S., J.M.S., H.K., G.S., J.S.-R. and S.M. wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: The HPN-DREAM network inference challenge: overview of in silico data tasks. (212 KB)

    Data were generated from a nonlinear dynamical model of the ErbB signaling pathway (Chen et al., 2009). Training data consisted of time-courses for 20 network nodes under three inhibitors targeting specific nodes, or no inhibitor, and under two ligand stimuli, applied individually and in combination at two concentrations. In total there were 20 different (inhibitor, stimulus) conditions as shown (top right). Time-courses comprised 11 time points and three technical replicates were provided. Node names were anonymized to prevent use of biological prior information. The sub-challenge 1 in silico data task (SC1B) asked participants to infer a single directed, weighted network using the training data. The aim of the sub-challenge 2 in silico data task (SC2B) was to predict stimulus-specific time-courses under unseen interventions. For SC1B, submissions were assessed against a gold-standard network extracted from the data-generating model, with agreement quantified using AUROC score. For SC2B, predicted time-courses were assessed using held-out test data obtained under in silico inhibition of each network node in turn, with prediction accuracy quantified using root mean square error (RMSE). See Online Methods for further details of the in silico data tasks.

    Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).

  2. Supplementary Figure 2: Context-specific ‘gold-standard’ causal descendant sets for the network inference sub-challenge experimental data task (SC1A). (89 KB)

    Context-specific networks submitted to SC1A were assessed using held-out test data, obtained under inhibition of mTOR. Each column in the heatmap indicates, for a given (cell line, stimulus) context c, the phosphoproteins that showed salient changes under mTOR inhibition relative to DMSO control (black cells) and those that did not (white cells). Such changes were determined from the test data using a procedure centered around a paired t-test. Phosphoproteins that show salient changes can be regarded as descendants of mTOR in the underlying causal signaling network. Columns therefore represent context-specific experimentally-determined sets of causal descendants of mTOR, DcGS, and were used as a ‘gold-standard’ to assess inferred context-specific networks. Further details regarding the determination of the gold-standard descendant sets and the scoring procedure can be found in Online Methods. Missing data is indicated by gray cells (some phosphoprotein antibodies were only present in the (training and test) data for a subset of cell lines). Based on a figure in Hill, Nesser et al. (2016).

    Hill, S.M., Nesser, N.K. et al. Context-specificity in causal signaling networks revealed by phosphoprotein profiling. bioRxiv doi:10.1101/039636 (2016).

  3. Supplementary Figure 3: Network inference sub-challenge (SC1) final team scores and rankings. (96 KB)

    (a) Mean rank scores for the 74 teams that participated in the experimental data task (SC1A). Mean rank scores were used to obtain final team rankings. For the 40 teams that provided information regarding their approach, bar color indicates method type (see also Fig. 3e, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate teams with statistically significant AUROC scores (FDR < 0.05) in at least 50% of (cell line, stimulus) contexts (2 stars) or at least 25% of contexts (1 star) (multiple testing correction performed within each context with respect to number of teams). (b) AUROC scores for the 65 teams that participated in the in silico data task (SC1B). AUROC scores were used to obtain final team rankings. As in a, color indicates method type (see also Fig. 3f, Supplementary Table 2 and Supplementary Note 5). Stars above bars indicate statistically significant AUROC scores (FDR < 0.05). (c) Comparison of mean rank and mean AUROC scores for SC1A. (d) Final ranks for SC1A (dashed blue line) and SC1B (dotted green line) were averaged to obtain a combined score (solid red line) for the 59 teams that participated in both tasks. Teams ordered by combined score (see “SC1A/B combined final rank” column in Supplementary Table 2). See Online Methods for full details of scoring for SC1.

  4. Supplementary Figure 4: Gold-standard causal network for the network inference sub-challenge in silico data task (SC1B). (139 KB)

    The gold-standard network, used to assess networks submitted to SC1B, was obtained from a data-generating dynamical model of the ErbB signaling pathway. Derivation of the network was non-trivial due to variables appearing in complexes within the model and full details can be found in Supplementary Note 8. Three unconnected dummy nodes were incorporated in the model and node names were anonymized in the training data.

  5. Supplementary Figure 5: Balance of positives and negatives in the gold-standards for the network inference sub-challenge (SC1). (51 KB)

    The gold standard for the experimental data task (SC1A) comprised sets of descendants of mTOR for each (cell line, stimulus) context, experimentally-determined using the held-out test data. Shown (left) are the number of positives and negatives for each context; that is, the number of phosphoproteins that are descendants of mTOR according to the test data (positives) and the number that are non-descendants of mTOR (negatives). For the in silico data task (SC1B), the gold-standard consisted of the data-generating network. Shown (right) are the number of edges in this network (positives) and the number of non-edges (negatives).

  6. Supplementary Figure 6: Comparison of AUROC with an alternative scoring metric, AUPR, for the network inference sub-challenge (SC1). (76 KB)

    (a) Alternative team rankings were calculated by replacing AUROC with AUPR (area under the precision-recall curve) in the scoring procedure. The alternative rankings were compared with the original AUROC-based rankings for both the experimental data task (SC1A; left) and in silico data task (SC1B; right). (b) A further alternative ranking, combining both AUROC and AUPR, was obtained by ranking teams based on an average of final rank under AUROC and final rank under AUPR, and was compared with the original AUROC-based rankings.

  7. Supplementary Figure 7: Statistical significance of AUROC scores for the network inference sub-challenge experimental data task (SC1A). (65 KB)

    For each (cell line, stimulus) context, a null distribution over AUROC was generated and used to calculate an FDR-adjusted P value for each team (Online Methods). (a) The number of significant (FDR < 0.05) AUROC scores obtained by each team across the 32 contexts (multiple testing correction performed within each context with respect to number of teams). Teams are ordered according to their final ranking in SC1A (based on mean rank score). (b) For each context, the number of teams (out of a total of 74) that obtained significant AUROC scores. For two regimes (BT549, NRG1) and (BT20, Insulin), no teams obtained a significant AUROC score. These two regimes were disregarded in the scoring process.

  8. Supplementary Figure 8: Crowdsourced analysis for the network inference sub-challenge in silico data task (SC1B). (35 KB)

    (a) Aggregate submission networks were formed by integrating predicted networks across the top N teams (as given by final team rankings), with N varied between 1 (top performer only) and all teams (after removal of correlated submissions; Supplementary Note 10). Integration was done by averaging predicted edge weights (Online Methods). The blue line shows performance (AUROC) of the aggregate submission networks. Individual team scores are also depicted (red circles). (b) Predicted networks were integrated for subsets of N teams, selected at random. The blue line shows mean performance of the aggregate submission networks, calculated over 100 random subsets of teams (error bars indicate s.d.). Crowdsourced analysis for the experimental data network inference task is shown in Figure 3c,d.

  9. Supplementary Figure 9: Weighted combinations of two top performing approaches and aggregate prior network for the network inference sub-challenge experimental data task (SC1A). (62 KB)

    An extension of Figure 4b to show three-way combinations of (i) PropheticGranger – top performer for the experimental data task when combined with a prior network (here, the method is used without the prior network); (ii) FunChisq – top performer for the in silico data task and most consistent performer across both data types; and (iii) an aggregate prior network formed by integrating prior networks used by participants (Online Methods). The three approaches were combined by taking weighted averages of predicted edge scores for each (cell line, stimulus) context and performance assessed using mean AUROC. For example, the best performance (mean AUROC = 0.82) was achieved by combining 20% PropheticGranger, 50% FunChisq and 30% aggregate prior network, and is highlighted with an “X”. See Supplementary Note 1 for full details of the PropheticGranger and FunChisq approaches.

  10. Supplementary Figure 10: Time-course prediction sub-challenge experimental data task (SC2A): phosphoproteins showing the largest changes under mTOR inhbition are predicted with least accuracy. (74 KB)

    SC2A tasked participants with predicting phosphoprotein time-courses for each (cell line, stimulus) context under an unseen intervention (mTOR inhibition - mTORi). Submitted predictions were assessed against held-out test data obtained under mTORi. For each team, root mean square error (RMSE) scores were calculated for each (cell line, phosphoprotein) pair (see Supplementary Note 6). (a) Left: for each (cell line, phosphoprotein) pair, normalized RMSE1 for the top-ranked team (Team44) vs. absolute effect size. The effect size for a given (cell line, phosphoprotein) pair is a measure of the magnitude of abundance change under mTORi relative to DMSO control2. Note that this measure is based on the mTORi test data and is independent of team predictions. The strong positive correlation indicates that phosphoproteins showing little or no change under mTORi were predicted relatively well but phosphoproteins that showed large changes under mTORi were predicted badly. Right: examples of time-courses underlying the scatter plot (left). Shown are abundances of three phosphoproteins for cell line UACC812 under DMSO control and under mTORi, as predicted by Team44 and test data values. Note that normalized RMSE and effect size values are calculated across all stimuli, but only serum stimulus time-courses are shown here. (b) Scatter plots as in a for teams ranked 2 to 5 in SC2A. These results highlight the challenging nature of predicting protein abundance under unseen interventions but also point to a shortcoming of the RMSE score used here, namely that it does not sufficiently emphasize ability to predict proteins that change under intervention. For a future challenge, a modified metric that focuses on those proteins might therefore be useful.

    1To ensure comparability across cell lines and phosphoproteins, each RMSE score was normalized by the standard deviation of the test data used in the RMSE calculation.
    2Effect size is defined as the mean difference in phosphoprotein abundance between DMSO control and mTORi, normalized by the standard deviation of the differences. Means and standard deviations are calculated across all time points and stimuli for the given cell line.

  11. Supplementary Figure 11: Visualization sub-challenge (SC3) voting results and rankings. (43 KB)

    14 teams made submissions to the visualization sub-challenge. HPN-DREAM challenge participants were asked to select and rank (from 1 to 3) their three favorite submissions. The remaining unranked submissions were then assigned a rank of 4. Thirty-six participants participated in the voting process and the number of votes of each rank type is shown (bar plot, left axis). Final team ranks were based on the mean rank across the 36 votes (green line, right axis).

  12. Supplementary Figure 12: Robustness of rankings for the network inference sub-challenge (SC1). (113 KB)

    The test data was subsampled to assess robustness of rankings (Online Methods). Box plots show team ranks over 100 subsampling iterations, with 50% of the test data left out at each iteration. (a) Experimental data task - subsampling performed by removing 50% of phosphoproteins when assessing descendant sets for each (cell line, stimulus) context. (b) Experimental data task – subsampling performed by removing 50% of contexts from the scoring process. (c) In silico data task – subsampling performed by considering only 50% of edges/non-edges in the gold-standard network. For all box plots, the central line indicates the median, and the box edges denote the 25th and 75th percentiles. Whiskers extend to 1.5 times the interquartile range from the box hinge. Data points beyond the whiskers are regarded as outliers and are plotted individually.

PDF files

  1. Supplementary Text and Figures (9,453 KB)

    Supplementary Figures 1–12 and Supplementary Notes 1–11

Excel files

  1. Supplementary Table 1 (19 KB)

    List of antibodies.

  2. Supplementary Table 2 (41 KB)

    Network inference sub-challenge (SC1) results, metadata, and inclusion of teams in Consortium and post-challenge analyses.

  3. Supplementary Table 3 (141 KB)

    Comparison of aggregate submission networks with the aggregate prior network to identify novel, context-specific edges.

  4. Supplementary Table 4 (19 KB)

    Time-course prediction sub-challenge (SC2) results, metadata, and inclusion of teams in Consortium.

  5. Supplementary Table 5 (11 KB)

    (Cell line, phosphoprotein) pairs disregarded in the scoring procedure for the time-course prediction sub-challenge experimental data task (SC2A).

  6. Supplementary Table 6 (12 KB)

    (Test inhibitor, predicted node) pairs disregarded in the scoring procedure for the time-course prediction sub-challenge in silico data task (SC2B).

Additional data