Prediction of human population responses to toxic compounds by a collaborative competition

Journal name:
Nature Biotechnology
Volume:
33,
Pages:
933–940
Year published:
DOI:
doi:10.1038/nbt.3299
Received
Accepted
Published online
Corrected online

Abstract

The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytotoxicity of 156 compounds in 884 lymphoblastoid cell lines for which genotype and transcriptional data are available as part of the Tox21 1000 Genomes Project. The challenge participants developed algorithms to predict interindividual variability of toxic response from genomic profiles and population-level cytotoxicity data from structural attributes of the compounds. 179 submitted predictions were evaluated against an experimental data set to which participants were blinded. Individual cytotoxicity predictions were better than random, with modest correlations (Pearson's r < 0.28), consistent with complex trait genomic prediction. In contrast, predictions of population-level response to different compounds were higher (r < 0.66). The results highlight the possibility of predicting health risks associated with unknown compounds, although risk estimation accuracy remains suboptimal.

At a glance

Figures

  1. The NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge overview.
    Figure 1: The NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge overview.

    The cytotoxicity data used in the challenge consist of the EC10 data generated for 884 lymphoblastoid cell line in response to 156 common environmental compounds. Participants were provided with a training set of cytotoxicity data for 620 cell lines and 106 compounds along with genotype data for all cell lines, RNA-seq data for 337 cell lines and chemical attributes for all compounds. The challenge was divided into two independent subchallenges: in subchallenge 1, participants were asked to predict EC10 values for a separate test set of 264 cell lines in response to the 106 compounds (only 91 toxic compounds were used for final scoring); in subchallenge 2, they were asked to predict population parameters (in terms of median EC10 values and 5th (q05) to 95th (q95) interquantile distance) for a separate test set of 50 compounds.

  2. Significance of predictions.
    Figure 2: Significance of predictions.

    (ad) Submissions are compared with the null hypothesis for subchallenge 1 (a,b) and subchallenge 2 (c,d). For each metric used for scoring (Pearson correlation (a) and pCi (b) for subchallenge 1, and Pearson correlation (c) and Spearman correlation (d) for subchallenge 2), performances shown for submissions are computed compound by compound and then averaged across compounds. The null hypothesis is generated for random predictions computed by random sampling, compound by compound, from the training set. (e,f) Performance of individual predictions (first boxplot, in red) is compared with performances of randomly aggregated predictions (wisdom of the crowds, in green) and with the aggregation of all predictions (last black bar). Performances are shown in terms of average Pearson correlation computed between predicted and measured values separately for each compound. Predictions were aggregated by averaging them. To aggregate only independent predictions, only one submission for each team was considered as the average of all predictions submitted by the team.

  3. Performances of predictions.
    Figure 3: Performances of predictions.

    (a,b) Predictions were compared to the gold standard based on Pearson correlation for subchallenge 1 (a) and subchallenge 2 (b). The heatmap in a illustrates performances of all predictions for all compounds used for evaluation; predictions are ranked as in the final leaderboard and compounds are clustered. Pearson correlation values are saturated at −0.2 and 0.2. The heatmap in b illustrates performances of all ranked predictions for predicted median and interquantile range (q95–q05).

  4. Advantages of using RNA-seq data.
    Figure 4: Advantages of using RNA-seq data.

    (a,b) Performances of predictions for cell lines for which RNA-seq data were available were compared against performances of predictions for cell lines for which RNA-seq data were not available. Pearson correlation and pCi were computed for each compound; the comparison shows that predictions for cell lines for which RNA-seq data were available are significantly better (paired t-test, P << 10−10). All predictions are included in the analysis regardless of the actual use of the RNA-seq data.

  5. Best performing method subchallenge 1 and subchallenge 2.
    Figure 5: Best performing method subchallenge 1 and subchallenge 2.

    The prediction procedure of the best performing team of subchallenge 1. (a) Workflow of prediction for subchallenge 1. (b) Heatmap of number of cell lines in each category of “genetic cluster” (1–10, x axis) and geographic subpopulation (y axis). (c) Modeling workflow used by team QBRC for Toxicogenetics Challenge subchallenge 2. The model starts from deriving potential toxicity-related features by comparing response data and chemical descriptor profiles (step 1) and classifying compounds based on their toxicity responses (step 2). Then, group-specific models are built based on group-specific chemical features and the entire training set (step 3). Finally, the toxicity of a new compound is calculated as a weighted average of the predicted toxicities from each group-specific model (step 4). (d) In step 3, differentially distributed features and all training samples are used to develop group-specific models. (e) In step 4, model applicability domain and the similarities between the new compound and the compound group are used to determine the weights for each group-specific model.

  6. Overview of methods and data used to solve the challenges.
    Figure 6: Overview of methods and data used to solve the challenges.

    Overview of the input data, data reduction techniques, prediction algorithms and model validation techniques used by participants to solve the challenge. Participants were asked to fill out a survey in order to be included in this publication as part of the NIEHS-NCATS-UNC Dream Toxicogenetics challenge consortium; only data for teams that filled out the survey are shown here. Each row corresponds to a submission, and they are ordered based on the final rank for subchallenge 1 and subchallenge 2, respectively. Data originate from 75 filled-out surveys for subchallenge 1 (of 99 submissions) and 51 filled-out surveys for subchallenge 2 (of 80 submissions). This corresponds to 21 (of 34) teams for subchallenge 1 and 12 (of 23) for subchallenge 2.

Change history

Corrected online 01 October 2015
In the version of this article initially published, in the HTML only, all authors names were incorrectly included in the main author list, and several authors names were repeated. The authors have now added 12 additional authors to the list of “Other participants in the NIEHS-NCATS-UNC DREAM Toxicogenetics Collaboration,” including Alok Jaiswal, Antti Poso, Himanshu Chheda, Ismeet Kaur, Jing Tang, John-Patrick Mpindi, Krister Wennerberg, Natalio Krasnogor, Samuel Kaski, Tero Aittokallio, Petteri Hintsanen and Suleiman Ali Khan. Names in this list that were also in the list of “top-performing teams” have been deleted. In addition, affiliation number 3 (Univ. Texas) has been added to Xiaowei Zhang’s name in the main author list. The errors have been corrected in the HTML and PDF versions of the article.

References

  1. Judson, R. et al. The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685695 (2009).
  2. Jacobs, A.C. & Hatfield, K.P. History of chronic toxicity and animal carcinogenicity studies for pharmaceuticals. Vet. Pathol. 50, 324333 (2013).
  3. Zeise, L. et al. Addressing human variability in next-generation human health risk assessments of environmental chemicals. Environ. Health Perspect. 121, 2331 (2013).
  4. Dorne, J.L.C.M. Metabolism, variability and risk assessment. Toxicology 268, 156164 (2010).
  5. Abdo, N. et al. Population-based in vitro hazard and concentration-response assessment of chemicals: the 1000 Genomes high-throughput screening Study. Environ. Health Perspect. 123, 458466 (2015).
  6. Burczynski, M.E. et al. Toxicogenomics-based discrimination of toxic mechanism in HepG2 human hepatoma cells. Toxicol. Sci. 58, 399415 (2000).
  7. Uehara, T. et al. Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database. Toxicol. Appl. Pharmacol. 255, 297306 (2011).
  8. Kleinstreuer, N.C. et al. Phenotypic screening of the ToxCast chemical library to classify toxic and therapeutic mechanisms. Nat. Biotechnol. 32, 583591 (2014).
  9. Choy, E. et al. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 4, e1000287 (2008).
  10. Caliskan, M., Cusanovich, D.A., Ober, C. & Gilad, Y. The effects of EBV transformation on gene expression levels and methylation profiles. Hum. Mol. Genet. 20, 16431652 (2011).
  11. Mangravite, L.M. et al. A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature 502, 377380 (2013).
  12. Gamazon, E.R. et al. Comprehensive genetic analysis of cytarabine sensitivity in a cell-based model identifies polymorphisms associated with outcome in AML patients. Blood 121, 43664376 (2013).
  13. Collins, F.S., Gray, G.M. & Bucher, J.R. Toxicology: transforming environmental health protection. Science 319, 906907 (2008).
  14. Margolin, A.A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
  15. Costello, J.C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 12021212 (2014).
  16. 1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  17. 1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010).
  18. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506511 (2013).
  19. Brown, C.C. et al. Genome-wide association and pharmacological profiling of 29 anticancer agents using lymphoblastoid cell lines. Pharmacogenomics 15, 137146 (2014).
  20. Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199D205 (2014).
  21. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 1554515550 (2005).
  22. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559575 (2007).
  23. Steinbeck, C. et al. The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493500 (2003).
  24. Kuz'min, V.E., Artemenko, A.G. & Muratov, E.N. Hierarchical QSAR technology based on the Simplex representation of molecular structure. J. Comput. Aided Mol. Des. 22, 403421 (2008).
  25. Todeschini, R., Consonni, V., Mauri, A. & Pavan, M. DRAGON-software for the calculation of molecular descriptors. Web version 3 (2004).
  26. Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 326 (2001).
  27. Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796804 (2012).
  28. Meyer, P. et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014).
  29. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565569 (2010).
  30. Park, J.-H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570575 (2010).
  31. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400405 (2013).
  32. Xia, M. et al. Compound cytotoxicity profiling using quantitative high-throughput screening. Environ. Health Perspect. 116, 284291 (2008).
  33. Johnson, W.E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118127 (2007).
  34. Derry, J.M.J. et al. Developing predictive molecular maps of human disease through community-based modeling. Nat. Genet. 44, 127130 (2012).
  35. Zheng, W. & Tropsha, A. Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle. J. Chem. Inf. Comput. Sci. 40, 185194 (2000).
  36. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100D1107 (2012).
  37. Wang, Y. et al. PubChem's BioAssay Database. Nucleic Acids Res. 40, D400D412 (2012).
  38. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742754 (2010).

Download references

Author information

  1. Present address: Joint Research Center for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Aachen, Germany.

    • Julio Saez-Rodriguez
  2. These authors contributed equally to this work.

    • Federica Eduati,
    • Lara M Mangravite,
    • Tao Wang &
    • Hao Tang

Affiliations

  1. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK.

    • Federica Eduati,
    • Michael P Menden &
    • Julio Saez-Rodriguez
  2. Sage Bionetworks, Seattle, Washington, USA.

    • Lara M Mangravite,
    • J Christopher Bare,
    • Thea Norman,
    • Mike Kellen &
    • Stephen Friend
  3. Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA.

    • Tao Wang,
    • Hao Tang,
    • Jichen Yang,
    • Xiaowei Zhan,
    • Rui Zhong,
    • Guanghua Xiao &
    • Yang Xie
  4. The Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA.

    • Hao Tang &
    • Yang Xie
  5. Division of Preclinical Innovation, National Institutes of Health Chemical Genomics Center, National Center for Advancing Translational Sciences, Rockville, Maryland, USA.

    • Ruili Huang,
    • Menghang Xia &
    • Anton Simeonov
  6. Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

    • Xiaowei Zhan
  7. Department of Environmental Sciences and Engineering, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Nour Abdo,
    • Oksana Kosyk &
    • Ivan Rusyn
  8. Department of Public Health, Faculty of Medicine, Jordan University of Science and Technology, Irbid, Jordan.

    • Nour Abdo
  9. National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA.

    • Allen Dearry &
    • Raymond R Tice
  10. North Carolina State University, Bioinformatics Research Center, Department of Statistics and Biological Sciences, Raleigh, North Carolina, USA.

    • Fred A Wright
  11. IBM T.J. Watson Research Center, IBM, Yorktown Heights, New York, USA.

    • Gustavo Stolovitzky
  12. Institute of Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland.

    • Tero Aittokallio,
    • Himanshu Chheda,
    • Petteri Hintsanen,
    • Alok Jaiswal,
    • John-Patrick Mpindi,
    • Jing Tang &
    • Krister Wennerberg
  13. Department of Mathematics and Computer Science, University of Catania, Catania, Italy.

    • Salvatore Alaimo &
    • Misael Mongiovì
  14. Computational Genomics Department, Centro de Investigacion Principe Felipe (CIPF), Valencia, Spain.

    • Alicia Amadoz,
    • Joaquin Dopazo,
    • Rosa D Hernansaiz &
    • Patricia Sebastian-Leon
  15. Helsinki Institute for Information Technology, Department of Information and Computer Science, Aalto University, Espoo, Finland.

    • Muhammad Ammad-ud-din,
    • Samuel Kaski &
    • Suleiman Ali Khana
  16. Machine Learning and Computational Biology Research Group, Max Planck Institutes for Developmental Biology and for Intelligent Systems, Tübingen, Germany.

    • Chloé-Agathe Azencott,
    • Karsten Borgwardt,
    • Dominik Grimm,
    • Felipe Llinares López &
    • Carl Johann Simon-Gabriel
  17. Interdisciplinary Computing and Complex BioSystems (ICOS) research group, Newcastle University, Newcastle upon Tyne, UK.

    • Jaume Bacardit,
    • Natalio Krasnogor &
    • Nicola Lazzarini
  18. miRcore, Ann Arbor, Michigan, USA.

    • Pelham Barron,
    • Jungsoo Chang,
    • Marianne Cowherd &
    • Inhan Lee
  19. Mines ParisTech, Centre for Computational Biology, Fontainebleau, France.

    • Elsa Bernard,
    • Yunlong Jiao,
    • Erwan Scornet,
    • Veronique Stoven,
    • Jean-Philippe Vert &
    • Thomas Walter
  20. Institut Curie, Paris, France.

    • Elsa Bernard,
    • Yunlong Jiao,
    • Erwan Scornet,
    • Veronique Stoven,
    • Jean-Philippe Vert &
    • Thomas Walter
  21. INSERM U900, Paris, France.

    • Elsa Bernard,
    • Yunlong Jiao,
    • Erwan Scornet,
    • Veronique Stoven,
    • Jean-Philippe Vert &
    • Thomas Walter
  22. BIOTEC, Technical University of Dresden, Dresden, Germany.

    • Andreas Beyer,
    • Eleni G Christodoulou,
    • Mathieu Clément-Ziza,
    • Michael Kuhn &
    • Susanne Reinhardt
  23. CECAD, University of Cologne, Cologne, Germany.

    • Andreas Beyer &
    • Mathieu Clément-Ziza
  24. Center of Quantitative Biology, Peking University, Beijing, China.

    • Shao Bin &
    • Dai Ziwei
  25. Max Planck Institute for Molecular Genetics, Berlin, Germany.

    • Alena van Bömmel,
    • Brian Caffrey,
    • Matthias Heinig,
    • Matthew Huska,
    • Alessandro Mammana,
    • Juliane Perner &
    • Martin Vingron
  26. Battelle, Columbus, Ohio, USA.

    • April M Brys,
    • Joel D Elhard,
    • David A Friedenberg,
    • Jenni W Gorospe,
    • Courtney A Granville,
    • Andrea L Peabody,
    • Carol G Riffle &
    • Aaron J Sander
  27. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.

    • Jeffrey Chang,
    • Trevor Cohen,
    • Yunguo Gong,
    • Liang-Chin Huang,
    • Jingchun Sun,
    • Yonghui Wu,
    • Hua Xu &
    • W Jim Zheng
  28. Department of Integrative Biology and Pharmacology, The University of Texas Health Science Center at Houston, Houston, Texas, USA.

    • Jeffrey Chang
  29. Division of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Edinburgh, UK.

    • Sofie Demeyer,
    • Tom Michoel,
    • Konrad Rawlik &
    • Albert Tenesa
  30. LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.

    • Andre O Falcao &
    • Ana L Teixeira
  31. Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy.

    • Alfredo Ferro,
    • Rosalba Giugno &
    • Alfredo Pulvirenti
  32. Max-Delbrück Center for Molecular Medicine, Berlin, Germany.

    • Matthias Heinig
  33. Institute of Bioinformatics, Johannes Kepler University, Linz, Austria.

    • Sepp Hochreiter,
    • Günter Klambauer &
    • Andreas Mayr
  34. Adobe, San Jose, California, USA.

    • Ismeet Kaur
  35. Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland.

    • Miron Bartosz Kursa &
    • Zofia Wiśniewska
  36. CSIR-Institute of Genomics & Integrative Biology, New Delhi, India.

    • Rintu Kutum
  37. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.

    • Michael K K Leung
  38. National Cancer Centre Singapore, Singapore.

    • Weng Khong Lim
  39. Stepping Stone Genomics, McLean, Virginia, USA.

    • Charlie Liu
  40. Systems Biology Centre, University of Warwick, Coventry, UK.

    • Jonathan D Moore &
    • Richard S Savage
  41. Vital Connect, Inc., Campbell, California.

    • Ravi Narasimhan
  42. Molecular and Molecular Imaging Center, Ohio State University, Columbus, Ohio, USA.

    • Stephen O Opiyo
  43. Mount Sinai, New York, New York, USA.

    • Gaurav Pandey,
    • Douglas Ruderfer &
    • Sean Whalen
  44. School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland.

    • Antti Poso
  45. Warwick Medical School, University of Warwick, Coventry, UK.

    • Richard S Savage
  46. University Pierre et Marie Curie, Paris, France.

    • Erwan Scornet
  47. School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

    • Roded Sharan
  48. Centro de Química e Bioquímica, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.

    • Ana L Teixeira
  49. Gladstone Institutes, San Francisco, California, USA.

    • Sean Whalen
  50. National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.

    • Shihua Zhang &
    • Junfei Zhao

Consortia

  1. The NIEHS-NCATS-UNC DREAM Toxicogenetics Collaboration

    • Federica Eduati,
    • Lara M Mangravite,
    • Tao Wang,
    • Hao Tang,
    • J Christopher Bare,
    • Ruili Huang,
    • Thea Norman,
    • Mike Kellen,
    • Michael P Menden,
    • Jichen Yang,
    • Xiaowei Zhan,
    • Rui Zhong,
    • Guanghua Xiao,
    • Menghang Xia,
    • Nour Abdo,
    • Oksana Kosyk,
    • Stephen Friend,
    • Allen Dearry,
    • Anton Simeonov,
    • Raymond R Tice,
    • Ivan Rusyn,
    • Fred A Wright,
    • Gustavo Stolovitzky,
    • Yang Xie,
    • Julio Saez-Rodriguez,
    • Tero Aittokallio,
    • Salvatore Alaimo,
    • Alicia Amadoz,
    • Muhammad Ammad-ud-din,
    • Chloé-Agathe Azencott,
    • Jaume Bacardit,
    • Pelham Barron,
    • Elsa Bernard,
    • Andreas Beyer,
    • Shao Bin,
    • Alena van Bömmel,
    • Karsten Borgwardt,
    • April M Brys,
    • Brian Caffrey,
    • Jeffrey Chang,
    • Jungsoo Chang,
    • Himanshu Chheda,
    • Eleni G Christodoulou,
    • Mathieu Clément-Ziza,
    • Trevor Cohen,
    • Marianne Cowherd,
    • Sofie Demeyer,
    • Joaquin Dopazo,
    • Joel D Elhard,
    • Andre O Falcao,
    • Alfredo Ferro,
    • David A Friedenberg,
    • Rosalba Giugno,
    • Yunguo Gong,
    • Jenni W Gorospe,
    • Courtney A Granville,
    • Dominik Grimm,
    • Matthias Heinig,
    • Rosa D Hernansaiz,
    • Petteri Hintsanen,
    • Sepp Hochreiter,
    • Liang-Chin Huang,
    • Matthew Huska,
    • Alok Jaiswal,
    • Yunlong Jiao,
    • Samuel Kaski,
    • Ismeet Kaur,
    • Suleiman Ali Khana,
    • Günter Klambauer,
    • Natalio Krasnogor,
    • Michael Kuhn,
    • Miron Bartosz Kursa,
    • Rintu Kutum,
    • Nicola Lazzarini,
    • Inhan Lee,
    • Michael K K Leung,
    • Weng Khong Lim,
    • Charlie Liu,
    • Felipe Llinares López,
    • Alessandro Mammana,
    • Andreas Mayr,
    • Tom Michoel,
    • Misael Mongiovì,
    • Jonathan D Moore,
    • John-Patrick Mpindi,
    • Ravi Narasimhan,
    • Stephen O Opiyo,
    • Gaurav Pandey,
    • Andrea L Peabody,
    • Juliane Perner,
    • Antti Poso,
    • Alfredo Pulvirenti,
    • Konrad Rawlik,
    • Susanne Reinhardt,
    • Carol G Riffle,
    • Douglas Ruderfer,
    • Aaron J Sander,
    • Richard S Savage,
    • Erwan Scornet,
    • Patricia Sebastian-Leon,
    • Roded Sharan,
    • Carl Johann Simon-Gabriel,
    • Veronique Stoven,
    • Jingchun Sun,
    • Jing Tang,
    • Ana L Teixeira,
    • Albert Tenesa,
    • Jean-Philippe Vert,
    • Martin Vingron,
    • Thomas Walter,
    • Krister Wennerberg,
    • Sean Whalen,
    • Zofia Wiśniewska,
    • Yonghui Wu,
    • Hua Xu,
    • Shihua Zhang,
    • Junfei Zhao,
    • W Jim Zheng &
    • Dai Ziwei
  2. Challenge organizers:
    Federica Eduati, Lara M Mangravite, J Christopher Bare, Thea Norman, Mike Kellen, Michael P Menden, Stephen Friend, Gustavo Stolovitzky & Julio Saez-Rodriguez

    Data producers:
    NIEHS: Allen Dearry & Raymond R Tice
    NCATS: Ruili Huang, Menghang Xia & Anton Simeonov
    UNC: Nour Abdo, Oksana Kosyk, Ivan Rusyn & Fred A Wright

    Top-performing teams:
    Subchallenge 1: Tao Wang, Hao Tang, Xiaowei Zhan, Jichen Yang, Rui Zhong, Guanghua Xiao & Yang Xie
    Subchallenge 2: Hao Tang, Jichen Yang, Tao Wang, Guanghua Xiao & Yang Xie

    Other participants in the NIEHS-NCATS-UNC DREAM Toxicogenetics Collaboration:
    Tero Aittokallio, Salvatore Alaimo, Alicia Amadoz, Muhammad Ammad-ud-din, Chloé-Agathe Azencott, Jaume Bacardit, Pelham Barron, Elsa Bernard, Andreas Beyer, Shao Bin, Alena van Bömmel, Karsten Borgwardt, April M Brys, Brian Caffrey, Jeffrey Chang, Jungsoo Chang, Himanshu Chheda, Eleni G Christodoulou, Mathieu Clément-Ziza, Trevor Cohen, Marianne Cowherd, Sofie Demeyer, Joaquin Dopazo, Joel D Elhard, Andre O Falcao, Alfredo Ferro, David A Friedenberg, Rosalba Giugno, Yunguo Gong, Jenni W Gorospe, Courtney A Granville, Dominik Grimm, Matthias Heinig, Rosa D Hernansaiz, Petteri Hintsanen, Sepp Hochreiter, Liang-Chin Huang, Matthew Huska, Alok Jaiswal, Yunlong Jiao, Samuel Kaski, Ismeet Kaur, Suleiman Ali Khan, Günter Klambauer, Natalio Krasnogor, Michael Kuhn, Miron Bartosz Kursa, Rintu Kutum, Nicola Lazzarini, Inhan Lee, Michael K K Leung, Weng Khong Lim, Charlie Liu, Felipe Llinares López, Alessandro Mammana, Andreas Mayr, Tom Michoel, Misael Mongioví, Jonathan D Moore, John-Patrick Mpindi, Ravi Narasimhan, Stephen O Opiyo, Gaurav Pandey, Andrea L Peabody, Juliane Perner, Antti Poso, Alfredo Pulvirenti, Konrad Rawlik, Susanne Reinhardt, Carol G Riffle, Douglas Ruderfer, Aaron J Sander, Richard S Savage, Erwan Scornet, Patricia Sebastian-Leon, Roded Sharan, Carl Johann Simon-Gabriel, Veronique Stoven, Jingchun Sun, Jing Tang, Ana L Teixeira, Albert Tenesa, Jean-Philippe Vert, Martin Vingron, Thomas Walter, Krister Wennerberg, Sean Whalen, Zofia Wiśniewska, Yonghui Wu, Hua Xu, Shihua Zhang, Junfei Zhao, W Jim Zheng, Dai Ziwei

Contributions

F.E. designed the analyses, scored predictions, performed computational analyses of challenge outcomes and wrote the manuscript. L.M.M. led project design and implementation including data collection from participants and participated in data analysis and manuscript development. J.C.B. and M.K. implemented the leaderboard and final scoring of predictions, collection of code, methods and outcomes, and participated in writing the manuscript supplement. T.N. and S.F. participated in project design and development. A.D., R.T., R.H., M.X., A.S., N.A., O.K., I.R. and F.A.W. generated and processed the data, and contributed to project design and interpretation of results. I.R., F.A.W. and R.T. participated in writing the manuscript. M.P.M. contributed to development and implementation of methodologies to score predictions. T.W., H.T., X.Z., J.Y., R.Z., G.X. and Y.X. (led by T.W., H.T. and Y.X.) participated in the challenge as modelers, developing the model with the best predictive performance, participated in analysis of challenge outcomes and participated in writing the manuscript. J.S.-R. and G.S. were responsible for overall design, development and management of project and participated in writing the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,838 KB)

    Supplementary Figures 1–14

  2. Supplementary Tables (1,225 KB)

    Supplementary Tables 1–7

  3. Supplementary Information (458 KB)

    Predictive Models

Zip files

  1. Supplementary Code (1,294 KB)

Additional data