The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
At a glance
- Big Data: astronomical or genomical? PLoS Biol. 13, e1002195 (2015). et al.
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
- International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
- Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015). et al.
- Big biomedical data as the key resource for discovery science. J. Am. Med. Inform. Assoc. 22, 1126–1131 (2015). et al.
- Toward effective sharing of high-dimensional immunology data. Nat. Biotechnol. 32, 755–759 (2014). , &
- Unlocking and sharing data in astronomy. Bul. Am. Soc. Info. Sci. Tech. 41, 40–43 (2015).
- World Meteorological Organization. Climate data, management and exchange. WMO http://www.wmo.int/pages/themes/climate/climate_data_management_exchange.php (2009).
- 2013). Crowdsourcing. (MIT Press,
- Nesta. A guide to historical Challenge prizes Nesta http://www.nesta.org.uk/news/guide-historical-challenge-prizes (13 May 2014)
- Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013). &
- Using the crowd as an innovation partner. Harv. Bus. Rev. 91, 60–69 (2013). &
- The rise of crowdsourcing. Wired Magazine 14, 1–4 (2006).
This article coined the term crowdsourcing and highlighted its potential.
- 2007). Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time (Bloomsbury Publishing,
- Heritage Provider Network Health Prize. Improve healthcare, win $3,000,000. WebCite http://www.webcitation.org/65IuEDAsc (4 May 2011).
- Wikipedia. List of crowdsourcing projects. Wikipedia https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (updated 16 Jun 2016).
- Challenging the state of the art in protein structure prediction: highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 82, 26–42 (2014). et al.
- CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 2–9 (2003). et al.
- BioCreative-IV virtual issue. Database 2014, bau039 (2014). et al.
- Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013). et al.
- Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013). et al.
- Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).
References 22 and 23 describe RGASP as an early Benchmarking Challenge for RNA-seq data analysis.
- Dialogue on reverse-engineering assessment and methods. Ann. NY Acad. Sci. 1115, 1–22 (2007). , &
- Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013). et al.
- Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 51–57 (2015).
A Challenge with direct clinical implications.
- Crowd computing: using competitive dynamics to develop and refine highly predictive models. Drug Discov. Today 18, 472–478 (2013). , , &
- A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014). et al.
- A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1208–1212 (2014).
A Challenge to benchmark methods for precision medicine.
- Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet. 46, 318–319 (2014). et al.
- The project data sphere initiative: accelerating cancer research by sharing data. Oncologist 20, 464–e20 (2015). et al.
- The Prostate Cancer DREAM Challenge: a community-wide effort to use open clinical trial data for the quantitative prediction of outcomes in metastatic prostate cancer. Oncologist 459–460 (2015). , , , &
- The PRO-ACT database: design, initial analyses, and predictive features. Neurology 83, 1719–1725 (2014). et al.
- Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 45, 1121–1126 (2013). et al.
- The self-assessment trap: can we all be better than average? Mol. Syst. Biol. 7, 537 (2011). , &
- DREAMTools: a Python package for scoring collaborative challenges [version2; referees: 1 approved, 2 approved with reservations]. F1000Res. 4, 1030 (2015). et al.
- Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge. Nat. Genet. 45, 468–469 (2013). et al.
- Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013). et al.
- Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods 13, 310–318 (2016). et al.
- Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16, 229–239 (2009). , , &
- Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl Acad. Sci. USA 107, 6286–6291 (2010). et al.
- Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010). et al.
- Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012).
This paper introduces the wisdom-of-crowds concept in computational biology.
- A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172–181 (2009). et al.
- Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009). , &
- Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics 19 (Suppl 2), ii122–ii129 (2003). , &
- GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011). , &
- GenePattern 2.0. Nat. Genet. 38, 500–501 (2006). et al.
- Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016). et al.
- Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). et al.
- Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). et al.
- Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009). et al.
- Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).
- A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007). &
- Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005). , , &
- Feature based approach to modeling protein–DNA interactions. PLoS Comput. Biol. 4, e1000154 (2008). , , &
- A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS ONE 4, e8155 (2009). et al.
- Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006). et al.
- A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011). , , &
- Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015). et al.
- Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease. Alzheimers Dement. 12, 645–653 (2016). et al.
- Critical Assessment of Genome Interpretation. Cystathionine beta-Synthase (CBS) single amino acid mutations. CAGI http://cagi2010.org/content/CBS (updated 3 Nov 2010).
- A probabilistic model to predict clinical phenotypic traits from genome sequencing. PLoS Comput. Biol. 10, e1003825 (2014). et al.
- Data Sharing. N. Engl. J. Med. 374, 276–277 (2016). &
- First, design for data sharing. Nat. Biotechnol. 34, 377–379 (2016). &
- Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinform. 17, 23–32 (2015). , , , &
- Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J. Behav. Decis. Mak. 26, 213–224 (2013). , &
- sbvIMPROVER project team. On crowd-verification of biological networks. Bioinform. Biol. Insights 7, 307–325 (2013).
- WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44, D488–D494 (2015). et al.
- A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013). et al.
- Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis. PLoS ONE 7, e39808 (2012). et al.
- Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. J. Am. Med. Inform. Assoc. 22, 640–648 (2015). et al.
- Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010). et al.
- Folding@Home and Genome@Home: using distributed computing to tackle previously intractable problems in computational biology. arXiv https://arxiv.org/abs/0901.0866 (2009). , , &
- Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69 (Suppl. 8), 118–128 (2007). et al.
- Games with a scientific purpose. Genome Biol. 12, 135 (2011). &
- Scientific rigor through videogames. Trends Biochem. Sci. 39, 507–509 (2014). &
- RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 2122–2127 (2014). et al.
- Exploring the quantum speed limit with computer games. Nature 532, 210–213 (2016). et al.
- Longitude Prize for the twenty-first century. Nature 509, 401 (2014).
- A doctor in the palm of your hand: how the Qualcomm Tricorder X-Prize could help to revolutionize medical diagnosis. IEEE Pulse 5, 50–54 (2014).
- Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res. 23, 1928–1937 (2013). et al.
- STATISTICS. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015). et al.
- The Ladder: a reliable leaderboard for machine learning competitions. arXiv https://arxiv.org/abs/1502.04585 (2015). &
- Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 15, S7 (2014). et al.
- Mitigating risk in academic preclinical drug discovery. Nat. Rev. Drug Discov. 14, 279–294 (2015). , &
- Verification of systems biology research in the age of collaborative competition. Nat. Biotechnol. 29, 811–815 (2011). et al.
- Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50 (2013). , &
- Toward better benchmarking: challenge-based methods assessment in cancer genomics. Genome Biol. 15, 462 (2014). , , , &
- Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014). et al.
- The Japanese toxicogenomics project: application of toxicogenomics. Mol. Nutr. Food Res. 54, 218–227 (2010). et al.
- Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21, 2224–2241 (2011). et al.
- Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013). et al.
- Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res. 24, 2077–2089 (2014). et al.
- Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015). et al.