Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Dark chemical matter as a promising starting point for drug lead discovery

Subjects

Abstract

High-throughput screening (HTS) is an integral part of early drug discovery. Herein, we focused on those small molecules in a screening collection that have never shown biological activity despite having been exhaustively tested in HTS assays. These compounds are referred to as 'dark chemical matter' (DCM). We quantified DCM, validated it in quality control experiments, described its physicochemical properties and mapped it into chemical space. Through analysis of prospective reporter-gene assay, gene expression and yeast chemogenomics experiments, we evaluated the potential of DCM to show biological activity in future screens. We demonstrated that, despite the apparent lack of activity, occasionally these compounds can result in potent hits with unique activity and clean safety profiles, which makes them valuable starting points for lead optimization efforts. Among the identified DCM hits was a new antifungal chemotype with strong activity against the pathogen Cryptococcus neoformans but little activity at targets relevant to human safety.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Dark matter definition and characterization.
Figure 2: Dark matter in chemical space.
Figure 3: Hit rates and selectivity.
Figure 4: Prospective experiments.
Figure 5: Identification of Hem14 (protoporphyrinogen oxidase) as target for compound 1 and derivatives.

Similar content being viewed by others

Accession codes

Accessions

Protein Data Bank

References

  1. Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).

    Article  CAS  Google Scholar 

  2. Austin, C.P., Brady, L.S., Insel, T.R. & Collins, F.S. NIH Molecular Libraries Initiative. Science 306, 1138–1139 (2004).

    Article  CAS  Google Scholar 

  3. Dobson, C.M. Chemical space and biology. Nature 432, 824–828 (2004).

    Article  CAS  Google Scholar 

  4. Krier, M., Bret, G. & Rognan, D. Assessing the scaffold diversity of screening libraries. J. Chem. Inf. Model. 46, 512–524 (2006).

    Article  CAS  Google Scholar 

  5. Chuprina, A., Lukin, O., Demoiseaux, R., Buzko, A. & Shivanyuk, A. Drug- and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. J. Chem. Inf. Model. 50, 470–479 (2010).

    Article  CAS  Google Scholar 

  6. Bickerton, G.R., Paolini, G.V., Besnard, J., Muresan, S. & Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  CAS  Google Scholar 

  7. Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    Article  CAS  Google Scholar 

  8. Petrone, P.M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).

    Article  CAS  Google Scholar 

  9. Petrone, P.M. et al. Biodiversity of small molecules—a new perspective in screening set selection. Drug Discov. Today 18, 674–680 (2013).

    Article  CAS  Google Scholar 

  10. Wawer, M.J. et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc. Natl. Acad. Sci. USA 111, 10911–10916 (2014).

    Article  CAS  Google Scholar 

  11. Wang, Y. et al. PubChem's BioAssay Database. Nucleic Acids Res. 40, D400–D412 (2012).

    Article  CAS  Google Scholar 

  12. Wang, Y. et al. PubChem BioAssay: 2014 update. Nucleic Acids Res. 42, D1075–D1082 (2014).

    Article  CAS  Google Scholar 

  13. Oprea, T.I. et al. A crowdsourcing evaluation of the NIH chemical probes. Nat. Chem. Biol. 5, 441–447 (2009).

    Article  CAS  Google Scholar 

  14. Durstenfeld, R. Algorithm 235: Random permutation. Commun. ACM 7, 420 (1964).

    Article  Google Scholar 

  15. Nissink, J.W.M. & Blackburn, S. Quantification of frequent-hitter behavior based on historical high-throughput screening data. Future Med. Chem. 6, 1113–1126 (2014).

    Article  CAS  Google Scholar 

  16. Kenseth, J.R. & Coldiron, S.J. High-throughput characterization and quality control of small-molecule combinatorial libraries. Curr. Opin. Chem. Biol. 8, 418–423 (2004).

    Article  CAS  Google Scholar 

  17. Gleeson, M.P., Hersey, A., Montanari, D. & Overington, J. Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat. Rev. Drug Discov. 10, 197–208 (2011).

    Article  CAS  Google Scholar 

  18. Azzaoui, K. et al. Modeling promiscuity based on in vitro safety pharmacology profiling data. ChemMedChem 2, 874–880 (2007).

    Article  CAS  Google Scholar 

  19. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  CAS  Google Scholar 

  20. Stumpfe, D., Hu, Y., Dimova, D. & Bajorath, J. Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J. Med. Chem. 57, 18–28 (2014).

    Article  CAS  Google Scholar 

  21. Dimova, D., Hu, Y. & Bajorath, J. Matched molecular pair analysis of small molecule microarray data identifies promiscuity cliffs and reveals molecular origins of extreme compound promiscuity. J. Med. Chem. 55, 10220–10228 (2012).

    Article  CAS  Google Scholar 

  22. Breinbauer, R., Manger, M., Scheck, M. & Waldmann, H. Natural product guided compound library development. Curr. Med. Chem. 9, 2129–2145 (2002).

    Article  CAS  Google Scholar 

  23. King, F.J. et al. Pathway reporter assays reveal small molecule mechanisms of action. J. Assoc. Lab. Autom. 14, 374–382 (2009).

    Article  CAS  Google Scholar 

  24. Nigsch, F. et al. Determination of minimal transcriptional signatures of compounds for target prediction. EURASIP J. Bioinform. Syst. Biol. 2012, 2 (2012).

    Article  Google Scholar 

  25. Hoepfner, D. et al. High-resolution chemical dissection of a model eukaryote reveals targets, pathways and gene functions. Microbiol. Res. 169, 107–120 (2014).

    Article  CAS  Google Scholar 

  26. Glerum, D.M., Shtanko, A., Tzagoloff, A., Gorman, N. & Sinclair, P.R. Cloning and identification of HEM14, the yeast gene for mitochondrial protoporphyrinogen oxidase. Yeast 12, 1421–1425 (1996).

    Article  CAS  Google Scholar 

  27. Lee, A.Y. et al. Mapping the cellular response to small molecules using chemogenomic fitness signatures. Science 344, 208–211 (2014).

    Article  CAS  Google Scholar 

  28. Camadro, J.M., Matringe, M., Scalla, R. & Labbe, P. Kinetic studies on protoporphyrinogen oxidase inhibition by diphenyl ether herbicides. Biochem. J. 277, 17–21 (1991).

    Article  CAS  Google Scholar 

  29. Qin, X. et al. Structural insight into human variegate porphyria disease. FASEB J. 25, 653–664 (2011).

    Article  CAS  Google Scholar 

  30. Hamon, J. et al. In vitro safety pharmacology profiling: what else beyond hERG? Future Med. Chem. 1, 645–665 (2009).

    Article  CAS  Google Scholar 

  31. Watkins, R.E. et al. The human nuclear xenobiotic receptor PXR: structural determinants of directed promiscuity. Science 292, 2329–2333 (2001).

    Article  CAS  Google Scholar 

  32. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    Article  CAS  Google Scholar 

  33. Rose, P.W. et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015).

    Article  CAS  Google Scholar 

  34. Pletnev, I. et al. InChIKey collision resistance: an experimental testing. J. Cheminform. 4, 39 (2012).

    Article  CAS  Google Scholar 

  35. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  CAS  Google Scholar 

  36. Bemis, G.W. & Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    Article  CAS  Google Scholar 

  37. Yan, B. et al. Quality control in combinatorial chemistry: determination of the quantity, purity, and quantitative purity of compounds in combinatorial libraries. J. Comb. Chem. 5, 547–559 (2003).

    Article  CAS  Google Scholar 

  38. Gaugaz, F.Z. et al. The impact of cyclopropane configuration on the biological activity of cyclopropyl-epothilones. ChemMedChem 9, 2227–2232 (2014).

    Article  CAS  Google Scholar 

  39. Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of filamentous fungi (approved standard) 2nd edn., MA38-A2 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).

  40. Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of yeast (approved standard) 3rd edn., M27-A3 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).

Download references

Acknowledgements

A.M.W. and G.L.G. were presidential postdoctoral fellows supported by the Education Office of the Novartis Institutes for BioMedical Research. The authors thank M. Schirle, R. Nutiu, S. Reiling and E. Gregori-Puigjané for valuable discussions; T. Aust, O. Galuba and R. Riedl for support with the HIP and follow-up experiments; M. Popov and F. Nigsch for help with data mining; P. Selzer for the cell permeability model; G. Wendel, B. Burakowska and L. Koppes for help with compound management; and R. Guha, J. Bittker and J. Braisted for help with BARD.

Author information

Authors and Affiliations

Authors

Contributions

A.M.W., E.L., J.W.D. and M.G. conceived the study with contributions from A.S., I.M.W. and C.N.P. A.M.W. carried out the large-scale computational analyses of the Novartis and PubChem HTS assay results. G.L.G. performed the gene expression experiments. F.J.K. directed and analyzed the reporter-gene assay experiments. D.H. directed and analyzed the S. cerevisiae growth inhibition and chemogenomics experiments. C.S. performed S. cerevisiae experiments. J.M.P. and M.L.G. conducted the quality control experiments. J.T. and V.P. designed and performed the antifungal panel experiments. S.C. did safety profiling experiments. P.K. and A.C.-C. supervised the profiling of natural products against the cancer cell line panel. A.M.W., E.L., D.H., J.W.D. and M.G. wrote the manuscript with contributions from all authors that read and discussed the manuscript.

Corresponding authors

Correspondence to Anne Mai Wassermann or Meir Glick.

Ethics declarations

Competing interests

As employees of Novartis, the authors do have a perceived financial conflict of interest.

Supplementary information

Supplementary Text and Figures

Supplementary Results, Supplementary Tables 1–12, Supplementary Note 1 and Supplementary Figures 1–14. (PDF 1946 kb)

Supplementary Data Set 1

PubChem assay identifiers. All PubChem bioassays used in the analysis are reported. If two assay identifiers are listed in the same row, the corresponding PubChem bioassays have been combined because they reported different readouts from the same experiment. (XLS 101 kb)

Supplementary Data Set 2

Compound structures. The file reports InChI keys and SMILES strings for all dark compounds identified in the PubChem data set and a subset (10,355 structures) of the dark compounds in the Novartis data set (due to intellectual property reasons not all structures can be made available). For each compound, the field “set” reports whether the compound was identified as dark chemical matter for the PubChem, Novartis or both data sets. (XLSX 7000 kb)

Supplementary Data Set 3

Quality control results. For 623 compound structures identified as dark chemical matter in the Novartis data set, results from our quality control experiments are reported. Purity, identity, concentration, and comments about how to interpret the observed data for special cases (e.g. highly fluorinated compounds) are given. Compounds are represented by InChI keys and SMLES strings. (XLSX 54 kb)

Supplementary Data Set 4

DCM scaffolds. The data set lists 95 scaffolds that were significantly enriched in the PubChem DCM set. Scaffolds are reported as SMILES strings. For each scaffold, numbers of PubChem DCM and ACT compounds that it represents are reported. (XLSX 12 kb)

Supplementary Data Set 5

Dark chemical matter Bayes classifier. We attach the naive Bayes model trained on the PubChem data set as Pipeline Pilot component (xml file). This component returns a dark matter score for each molecular data record sent to it. (XML 2227 kb)

Supplementary Data Set 6

Reporter gene assay results. For 322 active (“ACT”) and 337 dark (“DCM”) compounds, we make activity readouts from the reporter gene assay panel available. Each row in the data table reports normalized activities for one compound across the 41 RGAs given in Supplementary Table 10. Activities were obtained 24 hours after compound treatment. If a compound has been tested in replicates, the reported activity value is the average of the normalized activities obtained for the different replicates. For details on compound activity normalization see the main text and references provided therein. (XLSX 274 kb)

Supplementary Data Set 7

Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 1 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 516 kb)

Supplementary Data Set 8

Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 10 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 518 kb)

Supplementary Data Set 9

Yeast growth inhibition compound list. The data set lists 178 dark compounds that were tested in yeast growth inhibition experiments. Only compound 1 reported in the manuscript showed activity in confirmation experiments, i.e., all other compounds are considered as inactive. Compounds are reported as InChI keys and SMILES strings. (XLSX 18 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wassermann, A., Lounkine, E., Hoepfner, D. et al. Dark chemical matter as a promising starting point for drug lead discovery. Nat Chem Biol 11, 958–966 (2015). https://doi.org/10.1038/nchembio.1936

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nchembio.1936

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing