Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Dark chemical matter as a promising starting point for drug lead discovery



High-throughput screening (HTS) is an integral part of early drug discovery. Herein, we focused on those small molecules in a screening collection that have never shown biological activity despite having been exhaustively tested in HTS assays. These compounds are referred to as 'dark chemical matter' (DCM). We quantified DCM, validated it in quality control experiments, described its physicochemical properties and mapped it into chemical space. Through analysis of prospective reporter-gene assay, gene expression and yeast chemogenomics experiments, we evaluated the potential of DCM to show biological activity in future screens. We demonstrated that, despite the apparent lack of activity, occasionally these compounds can result in potent hits with unique activity and clean safety profiles, which makes them valuable starting points for lead optimization efforts. Among the identified DCM hits was a new antifungal chemotype with strong activity against the pathogen Cryptococcus neoformans but little activity at targets relevant to human safety.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Dark matter definition and characterization.
Figure 2: Dark matter in chemical space.
Figure 3: Hit rates and selectivity.
Figure 4: Prospective experiments.
Figure 5: Identification of Hem14 (protoporphyrinogen oxidase) as target for compound 1 and derivatives.

Accession codes


Protein Data Bank


  1. 1

    Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).

    CAS  Article  Google Scholar 

  2. 2

    Austin, C.P., Brady, L.S., Insel, T.R. & Collins, F.S. NIH Molecular Libraries Initiative. Science 306, 1138–1139 (2004).

    CAS  Article  Google Scholar 

  3. 3

    Dobson, C.M. Chemical space and biology. Nature 432, 824–828 (2004).

    CAS  Article  Google Scholar 

  4. 4

    Krier, M., Bret, G. & Rognan, D. Assessing the scaffold diversity of screening libraries. J. Chem. Inf. Model. 46, 512–524 (2006).

    CAS  Article  Google Scholar 

  5. 5

    Chuprina, A., Lukin, O., Demoiseaux, R., Buzko, A. & Shivanyuk, A. Drug- and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. J. Chem. Inf. Model. 50, 470–479 (2010).

    CAS  Article  Google Scholar 

  6. 6

    Bickerton, G.R., Paolini, G.V., Besnard, J., Muresan, S. & Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    CAS  Article  Google Scholar 

  7. 7

    Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    CAS  Article  Google Scholar 

  8. 8

    Petrone, P.M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).

    CAS  Article  Google Scholar 

  9. 9

    Petrone, P.M. et al. Biodiversity of small molecules—a new perspective in screening set selection. Drug Discov. Today 18, 674–680 (2013).

    CAS  Article  Google Scholar 

  10. 10

    Wawer, M.J. et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc. Natl. Acad. Sci. USA 111, 10911–10916 (2014).

    CAS  Article  Google Scholar 

  11. 11

    Wang, Y. et al. PubChem's BioAssay Database. Nucleic Acids Res. 40, D400–D412 (2012).

    CAS  Article  Google Scholar 

  12. 12

    Wang, Y. et al. PubChem BioAssay: 2014 update. Nucleic Acids Res. 42, D1075–D1082 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Oprea, T.I. et al. A crowdsourcing evaluation of the NIH chemical probes. Nat. Chem. Biol. 5, 441–447 (2009).

    CAS  Article  Google Scholar 

  14. 14

    Durstenfeld, R. Algorithm 235: Random permutation. Commun. ACM 7, 420 (1964).

    Article  Google Scholar 

  15. 15

    Nissink, J.W.M. & Blackburn, S. Quantification of frequent-hitter behavior based on historical high-throughput screening data. Future Med. Chem. 6, 1113–1126 (2014).

    CAS  Article  Google Scholar 

  16. 16

    Kenseth, J.R. & Coldiron, S.J. High-throughput characterization and quality control of small-molecule combinatorial libraries. Curr. Opin. Chem. Biol. 8, 418–423 (2004).

    CAS  Article  Google Scholar 

  17. 17

    Gleeson, M.P., Hersey, A., Montanari, D. & Overington, J. Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat. Rev. Drug Discov. 10, 197–208 (2011).

    CAS  Article  Google Scholar 

  18. 18

    Azzaoui, K. et al. Modeling promiscuity based on in vitro safety pharmacology profiling data. ChemMedChem 2, 874–880 (2007).

    CAS  Article  Google Scholar 

  19. 19

    Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    CAS  Article  Google Scholar 

  20. 20

    Stumpfe, D., Hu, Y., Dimova, D. & Bajorath, J. Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J. Med. Chem. 57, 18–28 (2014).

    CAS  Article  Google Scholar 

  21. 21

    Dimova, D., Hu, Y. & Bajorath, J. Matched molecular pair analysis of small molecule microarray data identifies promiscuity cliffs and reveals molecular origins of extreme compound promiscuity. J. Med. Chem. 55, 10220–10228 (2012).

    CAS  Article  Google Scholar 

  22. 22

    Breinbauer, R., Manger, M., Scheck, M. & Waldmann, H. Natural product guided compound library development. Curr. Med. Chem. 9, 2129–2145 (2002).

    CAS  Article  Google Scholar 

  23. 23

    King, F.J. et al. Pathway reporter assays reveal small molecule mechanisms of action. J. Assoc. Lab. Autom. 14, 374–382 (2009).

    CAS  Article  Google Scholar 

  24. 24

    Nigsch, F. et al. Determination of minimal transcriptional signatures of compounds for target prediction. EURASIP J. Bioinform. Syst. Biol. 2012, 2 (2012).

    Article  Google Scholar 

  25. 25

    Hoepfner, D. et al. High-resolution chemical dissection of a model eukaryote reveals targets, pathways and gene functions. Microbiol. Res. 169, 107–120 (2014).

    CAS  Article  Google Scholar 

  26. 26

    Glerum, D.M., Shtanko, A., Tzagoloff, A., Gorman, N. & Sinclair, P.R. Cloning and identification of HEM14, the yeast gene for mitochondrial protoporphyrinogen oxidase. Yeast 12, 1421–1425 (1996).

    CAS  Article  Google Scholar 

  27. 27

    Lee, A.Y. et al. Mapping the cellular response to small molecules using chemogenomic fitness signatures. Science 344, 208–211 (2014).

    CAS  Article  Google Scholar 

  28. 28

    Camadro, J.M., Matringe, M., Scalla, R. & Labbe, P. Kinetic studies on protoporphyrinogen oxidase inhibition by diphenyl ether herbicides. Biochem. J. 277, 17–21 (1991).

    CAS  Article  Google Scholar 

  29. 29

    Qin, X. et al. Structural insight into human variegate porphyria disease. FASEB J. 25, 653–664 (2011).

    CAS  Article  Google Scholar 

  30. 30

    Hamon, J. et al. In vitro safety pharmacology profiling: what else beyond hERG? Future Med. Chem. 1, 645–665 (2009).

    CAS  Article  Google Scholar 

  31. 31

    Watkins, R.E. et al. The human nuclear xenobiotic receptor PXR: structural determinants of directed promiscuity. Science 292, 2329–2333 (2001).

    CAS  Article  Google Scholar 

  32. 32

    Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    CAS  Article  Google Scholar 

  33. 33

    Rose, P.W. et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015).

    CAS  Article  Google Scholar 

  34. 34

    Pletnev, I. et al. InChIKey collision resistance: an experimental testing. J. Cheminform. 4, 39 (2012).

    CAS  Article  Google Scholar 

  35. 35

    Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    CAS  Article  Google Scholar 

  36. 36

    Bemis, G.W. & Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    CAS  Article  Google Scholar 

  37. 37

    Yan, B. et al. Quality control in combinatorial chemistry: determination of the quantity, purity, and quantitative purity of compounds in combinatorial libraries. J. Comb. Chem. 5, 547–559 (2003).

    CAS  Article  Google Scholar 

  38. 38

    Gaugaz, F.Z. et al. The impact of cyclopropane configuration on the biological activity of cyclopropyl-epothilones. ChemMedChem 9, 2227–2232 (2014).

    CAS  Article  Google Scholar 

  39. 39

    Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of filamentous fungi (approved standard) 2nd edn., MA38-A2 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).

  40. 40

    Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of yeast (approved standard) 3rd edn., M27-A3 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).

Download references


A.M.W. and G.L.G. were presidential postdoctoral fellows supported by the Education Office of the Novartis Institutes for BioMedical Research. The authors thank M. Schirle, R. Nutiu, S. Reiling and E. Gregori-Puigjané for valuable discussions; T. Aust, O. Galuba and R. Riedl for support with the HIP and follow-up experiments; M. Popov and F. Nigsch for help with data mining; P. Selzer for the cell permeability model; G. Wendel, B. Burakowska and L. Koppes for help with compound management; and R. Guha, J. Bittker and J. Braisted for help with BARD.

Author information




A.M.W., E.L., J.W.D. and M.G. conceived the study with contributions from A.S., I.M.W. and C.N.P. A.M.W. carried out the large-scale computational analyses of the Novartis and PubChem HTS assay results. G.L.G. performed the gene expression experiments. F.J.K. directed and analyzed the reporter-gene assay experiments. D.H. directed and analyzed the S. cerevisiae growth inhibition and chemogenomics experiments. C.S. performed S. cerevisiae experiments. J.M.P. and M.L.G. conducted the quality control experiments. J.T. and V.P. designed and performed the antifungal panel experiments. S.C. did safety profiling experiments. P.K. and A.C.-C. supervised the profiling of natural products against the cancer cell line panel. A.M.W., E.L., D.H., J.W.D. and M.G. wrote the manuscript with contributions from all authors that read and discussed the manuscript.

Corresponding authors

Correspondence to Anne Mai Wassermann or Meir Glick.

Ethics declarations

Competing interests

As employees of Novartis, the authors do have a perceived financial conflict of interest.

Supplementary information

Supplementary Text and Figures

Supplementary Results, Supplementary Tables 1–12, Supplementary Note 1 and Supplementary Figures 1–14. (PDF 1946 kb)

Supplementary Data Set 1

PubChem assay identifiers. All PubChem bioassays used in the analysis are reported. If two assay identifiers are listed in the same row, the corresponding PubChem bioassays have been combined because they reported different readouts from the same experiment. (XLS 101 kb)

Supplementary Data Set 2

Compound structures. The file reports InChI keys and SMILES strings for all dark compounds identified in the PubChem data set and a subset (10,355 structures) of the dark compounds in the Novartis data set (due to intellectual property reasons not all structures can be made available). For each compound, the field “set” reports whether the compound was identified as dark chemical matter for the PubChem, Novartis or both data sets. (XLSX 7000 kb)

Supplementary Data Set 3

Quality control results. For 623 compound structures identified as dark chemical matter in the Novartis data set, results from our quality control experiments are reported. Purity, identity, concentration, and comments about how to interpret the observed data for special cases (e.g. highly fluorinated compounds) are given. Compounds are represented by InChI keys and SMLES strings. (XLSX 54 kb)

Supplementary Data Set 4

DCM scaffolds. The data set lists 95 scaffolds that were significantly enriched in the PubChem DCM set. Scaffolds are reported as SMILES strings. For each scaffold, numbers of PubChem DCM and ACT compounds that it represents are reported. (XLSX 12 kb)

Supplementary Data Set 5

Dark chemical matter Bayes classifier. We attach the naive Bayes model trained on the PubChem data set as Pipeline Pilot component (xml file). This component returns a dark matter score for each molecular data record sent to it. (XML 2227 kb)

Supplementary Data Set 6

Reporter gene assay results. For 322 active (“ACT”) and 337 dark (“DCM”) compounds, we make activity readouts from the reporter gene assay panel available. Each row in the data table reports normalized activities for one compound across the 41 RGAs given in Supplementary Table 10. Activities were obtained 24 hours after compound treatment. If a compound has been tested in replicates, the reported activity value is the average of the normalized activities obtained for the different replicates. For details on compound activity normalization see the main text and references provided therein. (XLSX 274 kb)

Supplementary Data Set 7

Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 1 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 516 kb)

Supplementary Data Set 8

Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 10 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 518 kb)

Supplementary Data Set 9

Yeast growth inhibition compound list. The data set lists 178 dark compounds that were tested in yeast growth inhibition experiments. Only compound 1 reported in the manuscript showed activity in confirmation experiments, i.e., all other compounds are considered as inactive. Compounds are reported as InChI keys and SMILES strings. (XLSX 18 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wassermann, A., Lounkine, E., Hoepfner, D. et al. Dark chemical matter as a promising starting point for drug lead discovery. Nat Chem Biol 11, 958–966 (2015).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing