Crowdsourcing biomedical research: leveraging communities as innovation engines

Saez-Rodriguez, Julio; Costello, James C.; Friend, Stephen H.; Kellen, Michael R.; Mangravite, Lara; Meyer, Pablo; Norman, Thea; Stolovitzky, Gustavo

doi:10.1038/nrg.2016.69

Review Article
Published: 15 July 2016

Crowdsourcing biomedical research: leveraging communities as innovation engines

Julio Saez-Rodriguez^1,2,
James C. Costello³,
Stephen H. Friend⁴,
Michael R. Kellen⁴,
Lara Mangravite⁴,
Pablo Meyer⁵,
Thea Norman⁴ &
…
Gustavo Stolovitzky^5,6

Nature Reviews Genetics volume 17, pages 470–486 (2016)Cite this article

4667 Accesses
94 Citations
74 Altmetric
Metrics details

Subjects

Key Points

Crowdsourcing is emerging as a novel framework to tackle scientific problems.
A variant of crowdsourcing, scientific competitions known as 'Challenges', enables a rigorous validation of methods, promotes reproducibility and fosters community building.
Challenges also accelerate scientific discovery by allowing large numbers of groups to work jointly on a problem.
Integrating predictions from different methods submitted by participants to solve a Challenge provides a robust solution that is often better than the best individual solution, a phenomenon known as the 'wisdom of crowds'.
The patterns of similar findings that emerge from several independent Challenges can provide useful insight into various key questions in genetics and genomics.

Abstract

The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Challenge platforms and organizations.**

**Figure 2: The steps and tasks in the organization of a Challenge.**

**Figure 3: The wisdom of crowds in theory and in practice.**

Open source and reproducible and inexpensive infrastructure for data challenges and education

Article Open access 02 January 2024

Peter E. DeWitt, Margaret A. Rebull & Tellen D. Bennett

The Translational Data Catalog - discoverable biomedical datasets

Article Open access 20 July 2023

Danielle Welter, Philippe Rocca-Serra, … Venkata Satagopam

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers

Article 23 September 2021

Laura Wratten, Andreas Wilm & Jonathan Göke

References

Stephens, Z. D. et al. Big Data: astronomical or genomical? PLoS Biol. 13, e1002195 (2015).
Article PubMed PubMed Central CAS Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Article CAS PubMed Google Scholar
Toga, A. W. et al. Big biomedical data as the key resource for discovery science. J. Am. Med. Inform. Assoc. 22, 1126–1131 (2015).
PubMed PubMed Central Google Scholar
Snijder, B., Kandasamy, R. K. & Superti-Furga, G. Toward effective sharing of high-dimensional immunology data. Nat. Biotechnol. 32, 755–759 (2014).
Article CAS PubMed Google Scholar
Henneken, E. Unlocking and sharing data in astronomy. Bul. Am. Soc. Info. Sci. Tech. 41, 40–43 (2015).
Article Google Scholar
World Meteorological Organization. Climate data, management and exchange. WMO http://www.wmo.int/pages/themes/climate/climate_data_management_exchange.php (2009).
Brabham, D. C. Crowdsourcing. (MIT Press, 2013).
Book Google Scholar
Nesta. A guide to historical Challenge prizes Nesta http://www.nesta.org.uk/news/guide-historical-challenge-prizes (13 May 2014)
Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013).
Article CAS PubMed Google Scholar
Boudreau, K. J. & Lakhani, K. R. Using the crowd as an innovation partner. Harv. Bus. Rev. 91, 60–69 (2013).
PubMed Google Scholar
Howe, J. The rise of crowdsourcing. Wired Magazine 14, 1–4 (2006). This article coined the term crowdsourcing and highlighted its potential.
Google Scholar
Sobel, D. Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time (Bloomsbury Publishing, 2007).
Google Scholar
Heritage Provider Network Health Prize. Improve healthcare, win $3,000,000. WebCite http://www.webcitation.org/65IuEDAsc (4 May 2011).
Wikipedia. List of crowdsourcing projects. Wikipedia https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (updated 16 Jun 2016).
Kryshtafovych, A. et al. Challenging the state of the art in protein structure prediction: highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 82, 26–42 (2014).
Article PubMed PubMed Central CAS Google Scholar
Janin, J. et al. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 2–9 (2003).
Article CAS PubMed Google Scholar
Arighi, C. N. et al. BioCreative-IV virtual issue. Database 2014, bau039 (2014).
Article PubMed PubMed Central Google Scholar
Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013).
Article PubMed PubMed Central CAS Google Scholar
Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).
Article PubMed PubMed Central CAS Google Scholar
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013). References 22 and 23 describe RGASP as an early Benchmarking Challenge for RNA-seq data analysis.
Article CAS PubMed PubMed Central Google Scholar
Stolovitzky, G. A., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods. Ann. NY Acad. Sci. 1115, 1–22 (2007).
Article PubMed Google Scholar
Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013).
Article PubMed PubMed Central CAS Google Scholar
Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 51–57 (2015). A Challenge with direct clinical implications.
Article CAS PubMed Google Scholar
Bentzien, J., Muegge, I., Hamner, B. & Thompson, D. C. Crowd computing: using competitive dynamics to develop and refine highly predictive models. Drug Discov. Today 18, 472–478 (2013).
Article CAS PubMed Google Scholar
Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014).
Article PubMed PubMed Central CAS Google Scholar
Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1208–1212 (2014). A Challenge to benchmark methods for precision medicine.
Article CAS Google Scholar
Boutros, P. C. et al. Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet. 46, 318–319 (2014).
Article PubMed PubMed Central CAS Google Scholar
Green, A. K. et al. The project data sphere initiative: accelerating cancer research by sharing data. Oncologist 20, 464–e20 (2015).
Article PubMed PubMed Central Google Scholar
Abdallah, K., Hugh-Jones, C., Norman, T., Friend, S. & Stolovitzky, G. The Prostate Cancer DREAM Challenge: a community-wide effort to use open clinical trial data for the quantitative prediction of outcomes in metastatic prostate cancer. Oncologist 459–460 (2015).
Atassi, N. et al. The PRO-ACT database: design, initial analyses, and predictive features. Neurology 83, 1719–1725 (2014).
Article PubMed PubMed Central CAS Google Scholar
Omberg, L. et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 45, 1121–1126 (2013).
Article PubMed PubMed Central CAS Google Scholar
Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: can we all be better than average? Mol. Syst. Biol. 7, 537 (2011).
Article PubMed PubMed Central Google Scholar
Cokelaer, T. et al. DREAMTools: a Python package for scoring collaborative challenges [version2; referees: 1 approved, 2 approved with reservations]. F1000Res. 4, 1030 (2015).
Article PubMed Google Scholar
Plenge, R. M. et al. Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge. Nat. Genet. 45, 468–469 (2013).
Article PubMed PubMed Central CAS Google Scholar
Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
Article PubMed PubMed Central Google Scholar
Hill, S. M. et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods 13, 310–318 (2016).
Article PubMed PubMed Central CAS Google Scholar
Marbach, D., Schaffter, T., Mattiussi, C. & Floreano, D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16, 229–239 (2009).
Article CAS PubMed Google Scholar
Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl Acad. Sci. USA 107, 6286–6291 (2010).
Article PubMed PubMed Central Google Scholar
Prill, R. J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).
Article PubMed PubMed Central CAS Google Scholar
Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). This paper introduces the wisdom-of-crowds concept in computational biology.
Article PubMed PubMed Central CAS Google Scholar
Cantone, I. et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172–181 (2009).
Article CAS PubMed Google Scholar
Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009).
Article CAS PubMed Google Scholar
Mendes, P., Sha, W. & Ye, K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics 19 (Suppl 2), ii122–ii129 (2003).
Article PubMed Google Scholar
Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).
Article CAS PubMed Google Scholar
Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500–501 (2006).
Article CAS PubMed Google Scholar
Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016).
Article PubMed PubMed Central CAS Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Article PubMed PubMed Central Google Scholar
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Article PubMed PubMed Central CAS Google Scholar
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
Article PubMed PubMed Central CAS Google Scholar
Benos, P. V. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).
Article PubMed PubMed Central CAS Google Scholar
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).
Article CAS PubMed Google Scholar
Zhao, X., Xiaoyue, Z., Haiyan, H. & Speed, T. P. Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005).
Article CAS PubMed Google Scholar
Sharon, E., Eilon, S., Shai, L. & Eran, S. A. Feature based approach to modeling protein–DNA interactions. PLoS Comput. Biol. 4, e1000154 (2008).
Article PubMed PubMed Central CAS Google Scholar
He, X. et al. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS ONE 4, e8155 (2009).
Article PubMed PubMed Central CAS Google Scholar
Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).
Article PubMed PubMed Central CAS Google Scholar
Annala, M., Laurila, K., Lähdesmäki, H. & Nykter, M. A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011).
Article PubMed PubMed Central CAS Google Scholar
Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015).
Article PubMed PubMed Central CAS Google Scholar
Allen, G. I. et al. Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease. Alzheimers Dement. 12, 645–653 (2016).
Article PubMed PubMed Central Google Scholar
Critical Assessment of Genome Interpretation. Cystathionine beta-Synthase (CBS) single amino acid mutations. CAGI http://cagi2010.org/content/CBS (updated 3 Nov 2010).
Chen, Y.-C. et al. A probabilistic model to predict clinical phenotypic traits from genome sequencing. PLoS Comput. Biol. 10, e1003825 (2014).
Article PubMed PubMed Central CAS Google Scholar
Longo, D. L. & Drazen, J. M. Data Sharing. N. Engl. J. Med. 374, 276–277 (2016).
Article PubMed Google Scholar
Wilbanks, J. & Friend, S. H. First, design for data sharing. Nat. Biotechnol. 34, 377–379 (2016).
Article CAS PubMed Google Scholar
Khare, R., Good, B. M., Leaman, R., Su, A. I. & Lu, Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinform. 17, 23–32 (2015).
Article PubMed PubMed Central Google Scholar
Goodman, J. K., Cryder, C. E. & Cheema, A. Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J. Behav. Decis. Mak. 26, 213–224 (2013).
Article Google Scholar
sbvIMPROVER project team. On crowd-verification of biological networks. Bioinform. Biol. Insights 7, 307–325 (2013).
Kutmon, M. et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44, D488–D494 (2015).
Article PubMed PubMed Central CAS Google Scholar
Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).
Article CAS PubMed Google Scholar
Vashisht, R. et al. Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis. PLoS ONE 7, e39808 (2012).
Article PubMed PubMed Central CAS Google Scholar
Mortensen, J. M. et al. Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. J. Am. Med. Inform. Assoc. 22, 640–648 (2015).
Article PubMed Google Scholar
Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010).
Article PubMed PubMed Central CAS Google Scholar
Larson, S. M., Snow, C. D., Shirts, M. & Pande, V. S. Folding@Home and Genome@Home: using distributed computing to tackle previously intractable problems in computational biology. arXiv https://arxiv.org/abs/0901.0866 (2009).
Das, R. et al. Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69 (Suppl. 8), 118–128 (2007).
Article CAS PubMed Google Scholar
Good, B. M. & Su, A. I. Games with a scientific purpose. Genome Biol. 12, 135 (2011).
Article PubMed PubMed Central Google Scholar
Treuille, A. & Das, R. Scientific rigor through videogames. Trends Biochem. Sci. 39, 507–509 (2014).
Article CAS PubMed Google Scholar
Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 2122–2127 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sørensen, J. J. W. H. et al. Exploring the quantum speed limit with computer games. Nature 532, 210–213 (2016).
Article CAS PubMed Google Scholar
Rees, M. A. Longitude Prize for the twenty-first century. Nature 509, 401 (2014).
Article CAS PubMed Google Scholar
Chandler, D. L. A doctor in the palm of your hand: how the Qualcomm Tricorder X-Prize could help to revolutionize medical diagnosis. IEEE Pulse 5, 50–54 (2014).
Article PubMed Google Scholar
Meyer, P. et al. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res. 23, 1928–1937 (2013).
Article PubMed PubMed Central CAS Google Scholar
Dwork, C. et al. STATISTICS. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015).
Article CAS PubMed Google Scholar
Blum, A. & Hardt, M. The Ladder: a reliable leaderboard for machine learning competitions. arXiv https://arxiv.org/abs/1502.04585 (2015).
Möller, S. et al. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 15, S7 (2014).
Article PubMed PubMed Central Google Scholar
Dahlin, J. L., Inglese, J. & Walters, M. A. Mitigating risk in academic preclinical drug discovery. Nat. Rev. Drug Discov. 14, 279–294 (2015).
Article PubMed PubMed Central CAS Google Scholar
Meyer, P. et al. Verification of systems biology research in the age of collaborative competition. Nat. Biotechnol. 29, 811–815 (2011).
Article CAS PubMed Google Scholar
Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50 (2013).
Article PubMed Google Scholar
Boutros, P. C., Margolin, A. A., Stuart, J. M., Califano, A. & Stolovitzky, G. Toward better benchmarking: challenge-based methods assessment in cancer genomics. Genome Biol. 15, 462 (2014).
Article PubMed PubMed Central Google Scholar
Meyer, P. et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014).
Article PubMed PubMed Central Google Scholar
Uehara, T. et al. The Japanese toxicogenomics project: application of toxicogenomics. Mol. Nutr. Food Res. 54, 218–227 (2010).
Article CAS PubMed Google Scholar
Earl, D. et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21, 2224–2241 (2011).
Article PubMed PubMed Central CAS Google Scholar
Bradnam, K. R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).
Article PubMed PubMed Central CAS Google Scholar
Earl, D. et al. Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res. 24, 2077–2089 (2014).
Article PubMed PubMed Central CAS Google Scholar
Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

The authors thank N. Aghaeepour, M. Bansal, P. Bertone, E. Bilal, P. Boutros, S. E. Brenner, J. Dopazo, D. Earl, F. Eduati, L. Heiser, S. Hill, P.-R. Loh, D. Marbach, J. Moult, M. Peters, S. Sieberts, J. Stuart, M. Weirauch and N. Zach for information on the crowdsourcing efforts they organized. The authors also thank the DREAM Challenges community, who taught them everything about Challenges that they have tried to share in this Review.

Author information

Authors and Affiliations

RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, D-52074, Germany
Julio Saez-Rodriguez
European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
Julio Saez-Rodriguez
Department of Pharmacology, University of Colorado, Anschutz Medical Campus, Aurora, 80045, Colorado, USA
James C. Costello
Sage Bionetworks, Seattle, 98109, Washington, USA
Stephen H. Friend, Michael R. Kellen, Lara Mangravite & Thea Norman
IBM Thomas J. Watson Research Center, Yorktown Heights, 10598, New York, USA
Pablo Meyer & Gustavo Stolovitzky
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, New York, USA
Gustavo Stolovitzky

Authors

Julio Saez-Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
James C. Costello
View author publications
You can also search for this author in PubMed Google Scholar
Stephen H. Friend
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Kellen
View author publications
You can also search for this author in PubMed Google Scholar
Lara Mangravite
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Thea Norman
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Stolovitzky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Julio Saez-Rodriguez or Gustavo Stolovitzky.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (box)

Scoring Metrics (PDF 241 kb)

Supplementary information S2 (table)

Examples of collaborative competitions. (PDF 252 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Table 1

Glossary

Cloud computing: An internet-based infrastructure to perform computational tasks remotely.
Crowdsourcing: A methodology that uses the voluntary help of large communities to solve problems posed by an organization.
Challenges: (Also known as collaborative competitions). Calls to a wide community to submit proposed solutions to a specific problem. These solutions are evaluated by a panel of experts using diverse criteria, and the best performer or winner is selected.
Gamification: The abstraction of a problem in such a way that working towards its solution feels like playing a computer game.
Benchmarking Challenge: A Challenge used to determine the relative performance of the methodologies used to solve a particular problem in which a known solution is available to the organizers but not the participants. The organizers compare the proposed solutions to the solution that is only available to them (that is, the gold standard). It is expected that the good solutions will generalize to instances of the problem for which the solution is unknown.
Gold standard: In allusion to the abandoned system of assigning the true value of a currency, the gold standard in a Challenge is the true solution to the posed problem in one particular instance of that problem.
Leaderboards: Tables that provide real-time feedback of performance and scores of the proposed solutions to a Challenge, allowing participants to monitor their ranking.
Training set: In general, this is the portion of the data used to train (fit) a computational model. In a Challenge, this is the data given to the participants to build their models. It normally encompasses most of the data.
Cross-validation set: A procedure whereby a participant uses subsets of the training data to adjust model parameters based on how well they predict this data set.
Test set: The subset of data that is separate from the training set and the cross-validation set (that is, the data that participants never have access to in any sort of way). The test set is used to do a final assessment of the predictive power of the models.
Wisdom of crowds: The collective wisdom that emerges when the solutions to a problem that are proposed by a large pool of people are aggregated. The aggregate solution is often better than the best individual solution.
Hackathons: Events in which specialists in a topic, normally related to computation, get together to work on a specific problem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saez-Rodriguez, J., Costello, J., Friend, S. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat Rev Genet 17, 470–486 (2016). https://doi.org/10.1038/nrg.2016.69

Download citation

Published: 15 July 2016
Issue Date: August 2016
DOI: https://doi.org/10.1038/nrg.2016.69

This article is cited by

Multimodal data fusion for cancer biomarker discovery with deep learning
- Sandra Steyaert
- Marija Pizurica
- Olivier Gevaert
Nature Machine Intelligence (2023)
Public Biological Databases and the Sui Generis Database Right
- Alexander Bernier
- Christian Busse
- Tania Bubela
IIC - International Review of Intellectual Property and Competition Law (2023)
Developing skin cancer education materials for darker skin populations: crowdsourced design, message targeting, and acral lentiginous melanoma
- Sean J. Upshaw
- Jakob D. Jensen
- Douglas Grossman
Journal of Behavioral Medicine (2023)
Machine and deep learning for longitudinal biomedical data: a review of methods and applications
- Anna Cascarano
- Jordi Mur-Petit
- Karim Lekadir
Artificial Intelligence Review (2023)
Crowdsourcing and open innovation: a systematic literature review, an integrated framework and a research agenda
- Livio Cricelli
- Michele Grimaldi
- Silvia Vermicelli
Review of Managerial Science (2022)