The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytotoxicity of 156 compounds in 884 lymphoblastoid cell lines for which genotype and transcriptional data are available as part of the Tox21 1000 Genomes Project. The challenge participants developed algorithms to predict interindividual variability of toxic response from genomic profiles and population-level cytotoxicity data from structural attributes of the compounds. 179 submitted predictions were evaluated against an experimental data set to which participants were blinded. Individual cytotoxicity predictions were better than random, with modest correlations (Pearson's r < 0.28), consistent with complex trait genomic prediction. In contrast, predictions of population-level response to different compounds were higher (r < 0.66). The results highlight the possibility of predicting health risks associated with unknown compounds, although risk estimation accuracy remains suboptimal.
At a glance
- The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685–695 (2009). et al.
- History of chronic toxicity and animal carcinogenicity studies for pharmaceuticals. Vet. Pathol. 50, 324–333 (2013). &
- Addressing human variability in next-generation human health risk assessments of environmental chemicals. Environ. Health Perspect. 121, 23–31 (2013). et al.
- Metabolism, variability and risk assessment. Toxicology 268, 156–164 (2010).
- Population-based in vitro hazard and concentration-response assessment of chemicals: the 1000 Genomes high-throughput screening Study. Environ. Health Perspect. 123, 458–466 (2015). et al.
- Toxicogenomics-based discrimination of toxic mechanism in HepG2 human hepatoma cells. Toxicol. Sci. 58, 399–415 (2000). et al.
- Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database. Toxicol. Appl. Pharmacol. 255, 297–306 (2011). et al.
- Phenotypic screening of the ToxCast chemical library to classify toxic and therapeutic mechanisms. Nat. Biotechnol. 32, 583–591 (2014). et al.
- Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 4, e1000287 (2008). et al.
- The effects of EBV transformation on gene expression levels and methylation profiles. Hum. Mol. Genet. 20, 1643–1652 (2011). , , &
- A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature 502, 377–380 (2013). et al.
- Comprehensive genetic analysis of cytarabine sensitivity in a cell-based model identifies polymorphisms associated with outcome in AML patients. Blood 121, 4366–4376 (2013). et al.
- Toxicology: transforming environmental health protection. Science 319, 906–907 (2008). , &
- Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013). et al.
- A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014). et al.
- 1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
- 1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
- Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). et al.
- Genome-wide association and pharmacological profiling of 29 anticancer agents using lymphoblastoid cell lines. Pharmacogenomics 15, 137–146 (2014). et al.
- Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014). et al.
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005). et al.
- PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). et al.
- The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003). et al.
- Hierarchical QSAR technology based on the Simplex representation of molecular structure. J. Comput. Aided Mol. Des. 22, 403–421 (2008). , &
- DRAGON-software for the calculation of molecular descriptors. Web version 3 (2004). , , &
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001). , , &
- Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012). et al.
- Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8, 13 (2014). et al.
- Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010). et al.
- Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010). et al.
- Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013). et al.
- Compound cytotoxicity profiling using quantitative high-throughput screening. Environ. Health Perspect. 116, 284–291 (2008). et al.
- Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). , &
- Developing predictive molecular maps of human disease through community-based modeling. Nat. Genet. 44, 127–130 (2012). et al.
- Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle. J. Chem. Inf. Comput. Sci. 40, 185–194 (2000). &
- ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012). et al.
- PubChem's BioAssay Database. Nucleic Acids Res. 40, D400–D412 (2012). et al.
- Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010). &