Article series: Study designs

Crowdsourcing biomedical research: leveraging communities as innovation engines

Journal name:
Nature Reviews Genetics
Volume:
17,
Pages:
470–486
Year published:
DOI:
doi:10.1038/nrg.2016.69
Published online

Abstract

The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.

At a glance

Figures

  1. Challenge platforms and organizations.
    Figure 1: Challenge platforms and organizations.

    The most popular researcher-driven Challenge initiatives in the life sciences (left) and the most popular commercial Challenge platforms (right) are shown. Initiatives, such as DREAM (Dialogue for Reverse Engineering Assessment and Methods), FlowCAP (Flow Cytometry Critical Assessment of Population Identification Methods), CAGI (Critical Assessment of Genome Interpretation) and sbv-IMPROVER (Systems Biology Verification combined with Industrial Methodology for Process Verification in Research), organize several Challenges per year; only the generic project and not the specific Challenges are shown. Among the most popular and successful commercial Challenge platforms are: InnoCentive, which crowdsources Challenges in science and technology (social sciences, physics, biology and chemistry); Topcoder, which serves the software developer community; and Kaggle, which administers Challenges to machine-learning and computer experts, addressing predictive analytics problems in a wide range of disciplines. The figure is not comprehensive, but highlights the most consistent and well-established Challenge initiatives. CAFA, Critical Assessment of Functional Annotation; CACAO, Cross-language Access to Catalogues And On-line libraries; CAMDA, Critical Assessment of Massive Data Analysis; CAPRI, Critical Assessment of PRediction of Interaction; CASP, Critical Assessment of protein Structure Prediction; CLARITY, Children's Leadership Award for the Reliable Interpretation and appropriate Transmission of Your genomic information; RGASP, RNA-seq Genome Annotation Assessment Project; TREC Crowd, Text REtrieval Conference Crowdsourcing Track.

  2. The steps and tasks in the organization of a Challenge.
    Figure 2: The steps and tasks in the organization of a Challenge.

    The main scientific steps of developing a Challenge are: the determination of the scientific question, the pre-processing and curation of the data, the dry run, the scoring and judging, the post-Challenge analysis and the Challenge reporting and paper writing. Technical considerations include: development and maintenance of the IT infrastructure that requires registration, creation of computing accounts, security needed for cloud-based data hosting and development of submission queues, leaderboards and discussion forums. The legal considerations include agreements with the data providers regarding restrictions of data use and the agreement that participants will abide by the Challenge rules. The social dimension includes the creation of an organizing team to plan, run and analyse the Challenge, as well as to determine and put incentives in place for participation, to advertise the Challenge, to moderate the discussion forum and to lead the post-Challenge activities, such as paper writing and conferences. Comms, communications; IRB, Institutional Review Board.

  3. The wisdom of crowds in theory and in practice.
    Figure 3: The wisdom of crowds in theory and in practice.

    Two case studies in the context of a hypothetical Challenge43 or the NIEHS–NCATS–UNC DREAM Toxicogenetics Challenge (a collaboration between the US National Institute of Environmental Health Sciences (NIEHS), the US National Center for Advancing Translational Sciences (NCATS) and the University of North Carolina (UNC))60. a–d | The hypothetical example shows three of the predictions that will be integrated into an aggregate ranked list. Two sufficient conditions for integration to outperform individual inference methods are: first, each of the inference methods must have better than random predictive power (that is, on average, items in the positive set are assigned better (lower) ranks than items in the negative set), and second, predictions of different inference methods must be statistically independent. In part b, we show the probability that a given method places a positive or negative item at a given rank. Positive items are assigned lower ranks on average, yet there is still some considerable probability of giving a low rank to a negative item. The area under the precision-recall curve (AUPR) of this method is only 0.41; for a random prediction with these parameters, we would expect an AUPR of 0.3. Suppose now that the integrated solution is computed for each item as the average of the assigned ranks to that item by each method. If, for the sake of simplicity, we assume that all methods have the same probability and the assigned ranks are independently chosen for the positive and negative sets, then the central limit theorem establishes that the average rank probability will approach a Gaussian distribution, with its variance shrinking as more methods are integrated. In this way, the probability of a positive to have lower ranks than negatives increases (parts c and d), resulting in an AUPR that tends to 1 (perfect prediction) as the number of integrated inference methods increases. e | An equivalent trend is seen in the Toxicogenetics Challenge using a different metric (Pearson correlation). The Pearson correlation is shown for all 24 methods submitted, and the box-plot for n randomly chosen predictions out of the 24. The median correlation of the aggregates increases as the number of aggregated methods increases. Parts a–d are adapted from Ref. 43, Nature Publishing Group. Part e is adapted from Ref. 60, Nature Publishing Group.

References

  1. Stephens, Z. D. et al. Big Data: astronomical or genomical? PLoS Biol. 13, e1002195 (2015).
  2. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  3. The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 11131120 (2013).
  4. International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993998 (2010).
  5. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
  6. Toga, A. W. et al. Big biomedical data as the key resource for discovery science. J. Am. Med. Inform. Assoc. 22, 11261131 (2015).
  7. Snijder, B., Kandasamy, R. K. & Superti-Furga, G. Toward effective sharing of high-dimensional immunology data. Nat. Biotechnol. 32, 755759 (2014).
  8. Henneken, E. Unlocking and sharing data in astronomy. Bul. Am. Soc. Info. Sci. Tech. 41, 4043 (2015).
  9. World Meteorological Organization. Climate data, management and exchange. WMO http://www.wmo.int/pages/themes/climate/climate_data_management_exchange.php (2009).
  10. Brabham, D. C. Crowdsourcing. (MIT Press, 2013).
  11. Nesta. A guide to historical Challenge prizes Nesta http://www.nesta.org.uk/news/guide-historical-challenge-prizes (13 May 2014)
  12. Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396398 (2013).
  13. Boudreau, K. J. & Lakhani, K. R. Using the crowd as an innovation partner. Harv. Bus. Rev. 91, 6069 (2013).
  14. Howe, J. The rise of crowdsourcing. Wired Magazine 14, 14 (2006).
    This article coined the term crowdsourcing and highlighted its potential.
  15. Sobel, D. Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time (Bloomsbury Publishing, 2007).
  16. Heritage Provider Network Health Prize. Improve healthcare, win $3,000,000. WebCite http://www.webcitation.org/65IuEDAsc (4 May 2011).
  17. Wikipedia. List of crowdsourcing projects. Wikipedia https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (updated 16 Jun 2016).
  18. Kryshtafovych, A. et al. Challenging the state of the art in protein structure prediction: highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 82, 2642 (2014).
  19. Janin, J. et al. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 29 (2003).
  20. Arighi, C. N. et al. BioCreative-IV virtual issue. Database 2014, bau039 (2014).
  21. Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228238 (2013).
  22. Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 11851191 (2013).
  23. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 11771184 (2013).
    References 22 and 23 describe RGASP as an early Benchmarking Challenge for RNA-seq data analysis.
  24. Stolovitzky, G. A., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods. Ann. NY Acad. Sci. 1115, 122 (2007).
  25. Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126134 (2013).
  26. Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 5157 (2015).
    A Challenge with direct clinical implications.
  27. Bentzien, J., Muegge, I., Hamner, B. & Thompson, D. C. Crowd computing: using competitive dynamics to develop and refine highly predictive models. Drug Discov. Today 18, 472478 (2013).
  28. Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 12131222 (2014).
  29. Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 12081212 (2014).
    A Challenge to benchmark methods for precision medicine.
  30. Boutros, P. C. et al. Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet. 46, 318319 (2014).
  31. Green, A. K. et al. The project data sphere initiative: accelerating cancer research by sharing data. Oncologist 20, 464e20 (2015).
  32. Abdallah, K., Hugh-Jones, C., Norman, T., Friend, S. & Stolovitzky, G. The Prostate Cancer DREAM Challenge: a community-wide effort to use open clinical trial data for the quantitative prediction of outcomes in metastatic prostate cancer. Oncologist 459460 (2015).
  33. Atassi, N. et al. The PRO-ACT database: design, initial analyses, and predictive features. Neurology 83, 17191725 (2014).
  34. Omberg, L. et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 45, 11211126 (2013).
  35. Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: can we all be better than average? Mol. Syst. Biol. 7, 537 (2011).
  36. Cokelaer, T. et al. DREAMTools: a Python package for scoring collaborative challenges [version2; referees: 1 approved, 2 approved with reservations]. F1000Res. 4, 1030 (2015).
  37. Plenge, R. M. et al. Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge. Nat. Genet. 45, 468469 (2013).
  38. Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
  39. Hill, S. M. et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods 13, 310318 (2016).
  40. Marbach, D., Schaffter, T., Mattiussi, C. & Floreano, D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16, 229239 (2009).
  41. Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl Acad. Sci. USA 107, 62866291 (2010).
  42. Prill, R. J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).
  43. Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796804 (2012).
    This paper introduces the wisdom-of-crowds concept in computational biology.
  44. Cantone, I. et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172181 (2009).
  45. Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159195 (2009).
  46. Mendes, P., Sha, W. & Ye, K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics 19 (Suppl 2), ii122ii129 (2003).
  47. Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 22632270 (2011).
  48. Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500501 (2006).
  49. Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366370 (2016).
  50. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 93629367 (2009).
  51. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 11901195 (2012).
  52. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 17201723 (2009).
  53. Benos, P. V. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 44424451 (2002).
  54. Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233237 (2007).
  55. Zhao, X., Xiaoyue, Z., Haiyan, H. & Speed, T. P. Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894906 (2005).
  56. Sharon, E., Eilon, S., Shai, L. & Eran, S. A. Feature based approach to modeling protein–DNA interactions. PLoS Comput. Biol. 4, e1000154 (2008).
  57. He, X. et al. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS ONE 4, e8155 (2009).
  58. Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 14291435 (2006).
  59. Annala, M., Laurila, K., Lähdesmäki, H. & Nykter, M. A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011).
  60. Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933940 (2015).
  61. Allen, G. I. et al. Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease. Alzheimers Dement. 12, 645653 (2016).
  62. Critical Assessment of Genome Interpretation. Cystathionine beta-Synthase (CBS) single amino acid mutations. CAGI http://cagi2010.org/content/CBS (updated 3 Nov 2010).
  63. Chen, Y.-C. et al. A probabilistic model to predict clinical phenotypic traits from genome sequencing. PLoS Comput. Biol. 10, e1003825 (2014).
  64. Longo, D. L. & Drazen, J. M. Data Sharing. N. Engl. J. Med. 374, 276277 (2016).
  65. Wilbanks, J. & Friend, S. H. First, design for data sharing. Nat. Biotechnol. 34, 377379 (2016).
  66. Khare, R., Good, B. M., Leaman, R., Su, A. I. & Lu, Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinform. 17, 2332 (2015).
  67. Goodman, J. K., Cryder, C. E. & Cheema, A. Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J. Behav. Decis. Mak. 26, 213224 (2013).
  68. sbvIMPROVER project team. On crowd-verification of biological networks. Bioinform. Biol. Insights 7, 307325 (2013).
  69. Kutmon, M. et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44, D488D494 (2015).
  70. Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419425 (2013).
  71. Vashisht, R. et al. Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis. PLoS ONE 7, e39808 (2012).
  72. Mortensen, J. M. et al. Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. J. Am. Med. Inform. Assoc. 22, 640648 (2015).
  73. Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756760 (2010).
  74. Larson, S. M., Snow, C. D., Shirts, M. & Pande, V. S. Folding@Home and Genome@Home: using distributed computing to tackle previously intractable problems in computational biology. arXiv https://arxiv.org/abs/0901.0866 (2009).
  75. Das, R. et al. Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69 (Suppl. 8), 118128 (2007).
  76. Good, B. M. & Su, A. I. Games with a scientific purpose. Genome Biol. 12, 135 (2011).
  77. Treuille, A. & Das, R. Scientific rigor through videogames. Trends Biochem. Sci. 39, 507509 (2014).
  78. Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 21222127 (2014).
  79. Sørensen, J. J. W. H. et al. Exploring the quantum speed limit with computer games. Nature 532, 210213 (2016).
  80. Rees, M. A. Longitude Prize for the twenty-first century. Nature 509, 401 (2014).
  81. Chandler, D. L. A doctor in the palm of your hand: how the Qualcomm Tricorder X-Prize could help to revolutionize medical diagnosis. IEEE Pulse 5, 5054 (2014).
  82. Meyer, P. et al. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res. 23, 19281937 (2013).
  83. Dwork, C. et al. STATISTICS. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636638 (2015).
  84. Blum, A. & Hardt, M. The Ladder: a reliable leaderboard for machine learning competitions. arXiv https://arxiv.org/abs/1502.04585 (2015).
  85. Möller, S. et al. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 15, S7 (2014).
  86. Dahlin, J. L., Inglese, J. & Walters, M. A. Mitigating risk in academic preclinical drug discovery. Nat. Rev. Drug Discov. 14, 279294 (2015).
  87. Meyer, P. et al. Verification of systems biology research in the age of collaborative competition. Nat. Biotechnol. 29, 811815 (2011).
  88. Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50 (2013).
  89. Boutros, P. C., Margolin, A. A., Stuart, J. M., Califano, A. & Stolovitzky, G. Toward better benchmarking: challenge-based methods assessment in cancer genomics. Genome Biol. 15, 462 (2014).
  90. Meyer, P. et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014).
  91. Uehara, T. et al. The Japanese toxicogenomics project: application of toxicogenomics. Mol. Nutr. Food Res. 54, 218227 (2010).
  92. Earl, D. et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21, 22242241 (2011).
  93. Bradnam, K. R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).
  94. Earl, D. et al. Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res. 24, 20772089 (2014).
  95. Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623630 (2015).

Download references

Author information

Affiliations

  1. RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen D-52074, Germany.

    • Julio Saez-Rodriguez
  2. European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

    • Julio Saez-Rodriguez
  3. Department of Pharmacology, University of Colorado, Anschutz Medical Campus, Aurora, Colorado 80045, USA.

    • James C. Costello
  4. Sage Bionetworks, Seattle, Washington 98109, USA.

    • Stephen H. Friend,
    • Michael R. Kellen,
    • Lara Mangravite &
    • Thea Norman
  5. IBM Thomas J. Watson Research Center, Yorktown Heights, New York 10598, USA.

    • Pablo Meyer &
    • Gustavo Stolovitzky
  6. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.

    • Gustavo Stolovitzky

Competing interests statement

The authors declare no competing interests.

Corresponding authors

Correspondence to:

Author details

  • Julio Saez-Rodriguez

    Julio Saez-Rodriguez is Professor of Computational Biomedicine at the Joint Research Center for Computational Biomedicine at RWTH Aachen University, Germany. He is also a visiting group leader at the European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL–EBI), Hinxton, UK, and co-director of the DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. He has a Ph.D. in process engineering (from the University of Magdeburg, Germany) and was a postdoctoral fellow at Harvard Medical School, Boston, Massachusetts, USA, and Massachusetts Institute of Technology (MIT), Cambridge, USA. His research focuses on computational models to understand the deregulation of signalling networks in disease and to identify novel therapeutics. Julio Saez-Rodriguez's homepage

  • James C. Costello

    James C. Costello is an assistant professor of pharmacology at the University of Colorado, Anschutz Medical Campus, Aurora, Colorado, USA, and co-director of the DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. He has a Ph.D. in informatics from Indiana University, Bloomington, USA, and was a Howard Hughes Medical Institude postdoctoral fellow at Boston University, Massachusetts, USA. His research focuses on systems-level approaches to understand cancer development, progression and therapeutic targets. James C. Costello's homepage

  • Stephen H. Friend

    Stephen H. Friend is Founder and President of Sage Bionetworks, Seattle, Washington, USA. He has dedicated his career to untangling the complex ways that our genes, our environments and our choices combine to form our health. He has pursued his passion from academic research at Harvard University and Massachusetts Institute of Technology (MIT), both in Cambridge, Massachusetts, USA, through entrepreneurial success (Rosetta Inpharmatics) and through being a senior vice president for oncology at Merck, Kenilworth, New Jersey, USA. At Sage Bionetworks, he has built an organization that connects a new way of doing scientific data analysis online to new methods for engaging citizens directly into previously closed research processes. Under his leadership, Sage Bionetworks develops technology platforms for data-intensive analysis, governance platforms for data sharing and reuse, and in collaboration with Gustavo Stolovitzky and the DREAM (Dialogue for Reverse Engineering Assessment and Methods) team helps run Challenges to solve complex biomedical problems.

  • Michael R. Kellen

    Michael R. Kellen is Director of Technology Platforms at Sage Bionetworks, Seattle, Washington, USA, and co-director of DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. He has a Ph.D. in bioengineering from the University of Washington, Seattle, USA. He leads the development of information technology platforms, including Synapse, that are focused on enabling large-scale data sharing among researchers in the life sciences and supporting DREAM Challenges.

  • Lara Mangravite

    Lara Mangravite is Director of Systems Biology at Sage Bionetworks, Seattle, Washington, USA, and co-director of DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. She has a Ph.D. in pharmaceutical chemistry from the University of California, San Francisco, USA, and was a postdoctoral fellow at the Children's Hospital Oakland Research Institute, California, USA. Her research focuses on the application of systems biology approaches to advance our understanding of disease biology and treatment outcomes with the overriding goal of improving clinical care. As part of these efforts, her team works to pilot new approaches to scientific processes that use open systems to enable community-based collaborative research efforts to solve complex problems.

  • Pablo Meyer

    Pablo Meyer is a team leader at IBM Research, Yorktown Heights, New York, USA, and co-director of the DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. He has a Ph.D. in biology from Rockefeller University, New York, USA, and was a Helen Hay Whitney postdoctoral fellow at Columbia University Medical School, New York, USA. His research focuses on single-cell systems biology of Bacillus subtilis and developing computational methods to understand metabolic and signalling network regulation.

  • Thea Norman

    Thea Norman is Director of Strategic Development at Sage Bionetworks, Seattle, Washington, USA, and a co-director of DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. She has a Ph.D. in chemistry from the University of California, Berkeley, USA, and was a postdoctoral fellow at the University of California Berkeley and the University of California San Francisco, USA. She works in close collaboration with Gustavo Stolovitzky to oversee the running of the Sage Bionetworks–DREAM Challenges. She also works with Sage Bionetworks' mobile health (mHealth) services team to develop and implement external collaborations.

  • Gustavo Stolovitzky

    Gustavo Stolovitzky is Distinguished Research Staff Member at IBM Research, Yorktown Heights, New York, USA, adjunct Professor at the Icahn School of Medicine at Mount Sinai, New York, USA, and Founder and Chair of DREAM (Dialogue for Reverse Engineering Assessment and Methods) Challenges. His research focuses on crowdsourcing in biomedical research, high-throughput biological data analysis, reverse engineering biological circuits, the mathematical modelling of biological processes and nanobiotechnology. Gustavo Stolovitzky's homepage

Supplementary information

PDF files

  1. Supplementary information S1 (box) (241 KB)

    Scoring Metrics

  2. Supplementary information S2 (table) (252 KB)

    Examples of collaborative competitions.

Additional data