Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Crowdsourcing biomedical research: leveraging communities as innovation engines

Key Points

  • Crowdsourcing is emerging as a novel framework to tackle scientific problems.

  • A variant of crowdsourcing, scientific competitions known as 'Challenges', enables a rigorous validation of methods, promotes reproducibility and fosters community building.

  • Challenges also accelerate scientific discovery by allowing large numbers of groups to work jointly on a problem.

  • Integrating predictions from different methods submitted by participants to solve a Challenge provides a robust solution that is often better than the best individual solution, a phenomenon known as the 'wisdom of crowds'.

  • The patterns of similar findings that emerge from several independent Challenges can provide useful insight into various key questions in genetics and genomics.

Abstract

The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Challenge platforms and organizations.
Figure 2: The steps and tasks in the organization of a Challenge.
Figure 3: The wisdom of crowds in theory and in practice.

References

  1. 1

    Stephens, Z. D. et al. Big Data: astronomical or genomical? PLoS Biol. 13, e1002195 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  2. 2

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  3. 3

    The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  4. 4

    International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

  5. 5

    Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Toga, A. W. et al. Big biomedical data as the key resource for discovery science. J. Am. Med. Inform. Assoc. 22, 1126–1131 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. 7

    Snijder, B., Kandasamy, R. K. & Superti-Furga, G. Toward effective sharing of high-dimensional immunology data. Nat. Biotechnol. 32, 755–759 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. 8

    Henneken, E. Unlocking and sharing data in astronomy. Bul. Am. Soc. Info. Sci. Tech. 41, 40–43 (2015).

    Article  Google Scholar 

  9. 9

    World Meteorological Organization. Climate data, management and exchange. WMO http://www.wmo.int/pages/themes/climate/climate_data_management_exchange.php (2009).

  10. 10

    Brabham, D. C. Crowdsourcing. (MIT Press, 2013).

    Google Scholar 

  11. 11

    Nesta. A guide to historical Challenge prizes Nesta http://www.nesta.org.uk/news/guide-historical-challenge-prizes (13 May 2014)

  12. 12

    Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. 13

    Boudreau, K. J. & Lakhani, K. R. Using the crowd as an innovation partner. Harv. Bus. Rev. 91, 60–69 (2013).

    PubMed  Google Scholar 

  14. 14

    Howe, J. The rise of crowdsourcing. Wired Magazine 14, 1–4 (2006). This article coined the term crowdsourcing and highlighted its potential.

    Google Scholar 

  15. 15

    Sobel, D. Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time (Bloomsbury Publishing, 2007).

    Google Scholar 

  16. 16

    Heritage Provider Network Health Prize. Improve healthcare, win $3,000,000. WebCite http://www.webcitation.org/65IuEDAsc (4 May 2011).

  17. 17

    Wikipedia. List of crowdsourcing projects. Wikipedia https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (updated 16 Jun 2016).

  18. 18

    Kryshtafovych, A. et al. Challenging the state of the art in protein structure prediction: highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 82, 26–42 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. 19

    Janin, J. et al. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 2–9 (2003).

    Article  CAS  Google Scholar 

  20. 20

    Arighi, C. N. et al. BioCreative-IV virtual issue. Database 2014, bau039 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  21. 21

    Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22

    Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  23. 23

    Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013). References 22 and 23 describe RGASP as an early Benchmarking Challenge for RNA-seq data analysis.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Stolovitzky, G. A., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods. Ann. NY Acad. Sci. 1115, 1–22 (2007).

    Article  PubMed  Google Scholar 

  25. 25

    Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. 26

    Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 51–57 (2015). A Challenge with direct clinical implications.

    Article  CAS  PubMed  Google Scholar 

  27. 27

    Bentzien, J., Muegge, I., Hamner, B. & Thompson, D. C. Crowd computing: using competitive dynamics to develop and refine highly predictive models. Drug Discov. Today 18, 472–478 (2013).

    Article  CAS  PubMed  Google Scholar 

  28. 28

    Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. 29

    Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1208–1212 (2014). A Challenge to benchmark methods for precision medicine.

    Article  CAS  Google Scholar 

  30. 30

    Boutros, P. C. et al. Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet. 46, 318–319 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  31. 31

    Green, A. K. et al. The project data sphere initiative: accelerating cancer research by sharing data. Oncologist 20, 464–e20 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  32. 32

    Abdallah, K., Hugh-Jones, C., Norman, T., Friend, S. & Stolovitzky, G. The Prostate Cancer DREAM Challenge: a community-wide effort to use open clinical trial data for the quantitative prediction of outcomes in metastatic prostate cancer. Oncologist 459–460 (2015).

  33. 33

    Atassi, N. et al. The PRO-ACT database: design, initial analyses, and predictive features. Neurology 83, 1719–1725 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  34. 34

    Omberg, L. et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 45, 1121–1126 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  35. 35

    Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: can we all be better than average? Mol. Syst. Biol. 7, 537 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  36. 36

    Cokelaer, T. et al. DREAMTools: a Python package for scoring collaborative challenges [version2; referees: 1 approved, 2 approved with reservations]. F1000Res. 4, 1030 (2015).

    Article  PubMed  Google Scholar 

  37. 37

    Plenge, R. M. et al. Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge. Nat. Genet. 45, 468–469 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  38. 38

    Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  39. 39

    Hill, S. M. et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods 13, 310–318 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  40. 40

    Marbach, D., Schaffter, T., Mattiussi, C. & Floreano, D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16, 229–239 (2009).

    Article  CAS  PubMed  Google Scholar 

  41. 41

    Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl Acad. Sci. USA 107, 6286–6291 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  42. 42

    Prill, R. J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43

    Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). This paper introduces the wisdom-of-crowds concept in computational biology.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  44. 44

    Cantone, I. et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137, 172–181 (2009).

    Article  CAS  PubMed  Google Scholar 

  45. 45

    Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Mendes, P., Sha, W. & Ye, K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics 19 (Suppl 2), ii122–ii129 (2003).

    Article  PubMed  Google Scholar 

  47. 47

    Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).

    Article  CAS  PubMed  Google Scholar 

  48. 48

    Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500–501 (2006).

    Article  CAS  Google Scholar 

  49. 49

    Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  50. 50

    Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  51. 51

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  52. 52

    Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  53. 53

    Benos, P. V. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  54. 54

    Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Zhao, X., Xiaoyue, Z., Haiyan, H. & Speed, T. P. Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005).

    Article  CAS  PubMed  Google Scholar 

  56. 56

    Sharon, E., Eilon, S., Shai, L. & Eran, S. A. Feature based approach to modeling protein–DNA interactions. PLoS Comput. Biol. 4, e1000154 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. 57

    He, X. et al. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS ONE 4, e8155 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58

    Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  59. 59

    Annala, M., Laurila, K., Lähdesmäki, H. & Nykter, M. A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  60. 60

    Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  61. 61

    Allen, G. I. et al. Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease. Alzheimers Dement. 12, 645–653 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  62. 62

    Critical Assessment of Genome Interpretation. Cystathionine beta-Synthase (CBS) single amino acid mutations. CAGI http://cagi2010.org/content/CBS (updated 3 Nov 2010).

  63. 63

    Chen, Y.-C. et al. A probabilistic model to predict clinical phenotypic traits from genome sequencing. PLoS Comput. Biol. 10, e1003825 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  64. 64

    Longo, D. L. & Drazen, J. M. Data Sharing. N. Engl. J. Med. 374, 276–277 (2016).

    Article  PubMed  Google Scholar 

  65. 65

    Wilbanks, J. & Friend, S. H. First, design for data sharing. Nat. Biotechnol. 34, 377–379 (2016).

    Article  CAS  PubMed  Google Scholar 

  66. 66

    Khare, R., Good, B. M., Leaman, R., Su, A. I. & Lu, Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinform. 17, 23–32 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  67. 67

    Goodman, J. K., Cryder, C. E. & Cheema, A. Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J. Behav. Decis. Mak. 26, 213–224 (2013).

    Article  Google Scholar 

  68. 68

    sbvIMPROVER project team. On crowd-verification of biological networks. Bioinform. Biol. Insights 7, 307–325 (2013).

  69. 69

    Kutmon, M. et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44, D488–D494 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  70. 70

    Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).

    Article  CAS  PubMed  Google Scholar 

  71. 71

    Vashisht, R. et al. Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis. PLoS ONE 7, e39808 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  72. 72

    Mortensen, J. M. et al. Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. J. Am. Med. Inform. Assoc. 22, 640–648 (2015).

    Article  PubMed  Google Scholar 

  73. 73

    Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  74. 74

    Larson, S. M., Snow, C. D., Shirts, M. & Pande, V. S. Folding@Home and Genome@Home: using distributed computing to tackle previously intractable problems in computational biology. arXiv https://arxiv.org/abs/0901.0866 (2009).

  75. 75

    Das, R. et al. Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69 (Suppl. 8), 118–128 (2007).

    Article  CAS  Google Scholar 

  76. 76

    Good, B. M. & Su, A. I. Games with a scientific purpose. Genome Biol. 12, 135 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  77. 77

    Treuille, A. & Das, R. Scientific rigor through videogames. Trends Biochem. Sci. 39, 507–509 (2014).

    Article  CAS  PubMed  Google Scholar 

  78. 78

    Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl Acad. Sci. USA 111, 2122–2127 (2014).

    Article  CAS  PubMed  Google Scholar 

  79. 79

    Sørensen, J. J. W. H. et al. Exploring the quantum speed limit with computer games. Nature 532, 210–213 (2016).

    Article  CAS  Google Scholar 

  80. 80

    Rees, M. A. Longitude Prize for the twenty-first century. Nature 509, 401 (2014).

    Article  CAS  PubMed  Google Scholar 

  81. 81

    Chandler, D. L. A doctor in the palm of your hand: how the Qualcomm Tricorder X-Prize could help to revolutionize medical diagnosis. IEEE Pulse 5, 50–54 (2014).

    Article  PubMed  Google Scholar 

  82. 82

    Meyer, P. et al. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res. 23, 1928–1937 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  83. 83

    Dwork, C. et al. STATISTICS. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015).

    Article  CAS  PubMed  Google Scholar 

  84. 84

    Blum, A. & Hardt, M. The Ladder: a reliable leaderboard for machine learning competitions. arXiv https://arxiv.org/abs/1502.04585 (2015).

  85. 85

    Möller, S. et al. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 15, S7 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  86. 86

    Dahlin, J. L., Inglese, J. & Walters, M. A. Mitigating risk in academic preclinical drug discovery. Nat. Rev. Drug Discov. 14, 279–294 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  87. 87

    Meyer, P. et al. Verification of systems biology research in the age of collaborative competition. Nat. Biotechnol. 29, 811–815 (2011).

    Article  CAS  PubMed  Google Scholar 

  88. 88

    Cheng, W.-Y., Ou Yang, T.-H. & Anastassiou, D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci. Transl. Med. 5, 181ra50 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  89. 89

    Boutros, P. C., Margolin, A. A., Stuart, J. M., Califano, A. & Stolovitzky, G. Toward better benchmarking: challenge-based methods assessment in cancer genomics. Genome Biol. 15, 462 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  90. 90

    Meyer, P. et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  91. 91

    Uehara, T. et al. The Japanese toxicogenomics project: application of toxicogenomics. Mol. Nutr. Food Res. 54, 218–227 (2010).

    Article  CAS  PubMed  Google Scholar 

  92. 92

    Earl, D. et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21, 2224–2241 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  93. 93

    Bradnam, K. R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  94. 94

    Earl, D. et al. Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res. 24, 2077–2089 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  95. 95

    Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank N. Aghaeepour, M. Bansal, P. Bertone, E. Bilal, P. Boutros, S. E. Brenner, J. Dopazo, D. Earl, F. Eduati, L. Heiser, S. Hill, P.-R. Loh, D. Marbach, J. Moult, M. Peters, S. Sieberts, J. Stuart, M. Weirauch and N. Zach for information on the crowdsourcing efforts they organized. The authors also thank the DREAM Challenges community, who taught them everything about Challenges that they have tried to share in this Review.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Julio Saez-Rodriguez or Gustavo Stolovitzky.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (box)

Scoring Metrics (PDF 241 kb)

Supplementary information S2 (table)

Examples of collaborative competitions. (PDF 252 kb)

PowerPoint slides

Glossary

Cloud computing

An internet-based infrastructure to perform computational tasks remotely.

Crowdsourcing

A methodology that uses the voluntary help of large communities to solve problems posed by an organization.

Challenges

(Also known as collaborative competitions). Calls to a wide community to submit proposed solutions to a specific problem. These solutions are evaluated by a panel of experts using diverse criteria, and the best performer or winner is selected.

Gamification

The abstraction of a problem in such a way that working towards its solution feels like playing a computer game.

Benchmarking Challenge

A Challenge used to determine the relative performance of the methodologies used to solve a particular problem in which a known solution is available to the organizers but not the participants. The organizers compare the proposed solutions to the solution that is only available to them (that is, the gold standard). It is expected that the good solutions will generalize to instances of the problem for which the solution is unknown.

Gold standard

In allusion to the abandoned system of assigning the true value of a currency, the gold standard in a Challenge is the true solution to the posed problem in one particular instance of that problem.

Leaderboards

Tables that provide real-time feedback of performance and scores of the proposed solutions to a Challenge, allowing participants to monitor their ranking.

Training set

In general, this is the portion of the data used to train (fit) a computational model. In a Challenge, this is the data given to the participants to build their models. It normally encompasses most of the data.

Cross-validation set

A procedure whereby a participant uses subsets of the training data to adjust model parameters based on how well they predict this data set.

Test set

The subset of data that is separate from the training set and the cross-validation set (that is, the data that participants never have access to in any sort of way). The test set is used to do a final assessment of the predictive power of the models.

Wisdom of crowds

The collective wisdom that emerges when the solutions to a problem that are proposed by a large pool of people are aggregated. The aggregate solution is often better than the best individual solution.

Hackathons

Events in which specialists in a topic, normally related to computation, get together to work on a specific problem.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Saez-Rodriguez, J., Costello, J., Friend, S. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat Rev Genet 17, 470–486 (2016). https://doi.org/10.1038/nrg.2016.69

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing