Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach

Nature Materials
Virtual screening is becoming a ground-breaking tool for molecular discovery due to the exponential growth of available computer time and constant improvement of simulation and machine learning techniques. We report an integrated organic functional material design process that incorporates theoretical insight, quantum chemistry, cheminformatics, machine learning, industrial expertise, organic synthesis, molecular characterization, device fabrication and optoelectronic testing. After exploring a search space of 1.6 million molecules and screening over 400,000 of them using time-dependent density functional theory, we identified thousands of promising novel organic light-emitting diode molecules across the visible spectrum. Our team collaboratively selected the best candidates from this set. The experimentally determined external quantum efficiencies for these synthesized candidates were as large as 22%.

  1. Discovery pipeline.
    Figure 1: Discovery pipeline.

    a, Diagram of the collaborative discovery approach: the search space decreases by over five orders of magnitude as the screening progresses. The cubes represent the size of the chemical space considered at any given stage of the process. The distinct screening stages, from left to right, involve different theoretical and computational approaches as well as experimental input and testing. b, Dependency tree for the quantum chemistry calculations employed in this study. The calculations labelled as backbone were performed for all analysed molecules, leading compounds were also characterized using the methods labelled emission, and the benchmarking calculations were used to assess predictive power.

  2. Experiment-theory calibration.
    Figure 2: Experiment–theory calibration.

    a, TD-DFT/B3LYP/6-31G(d) vertical absorption against photoluminescent emission maximum in toluene solution. Data in red represent measurements reported in the literature, while data in blue correspond to compounds synthesized in this screening work. The lines indicate a linear fit against the literature data (solid) with 95% confidence bounds (dashed). b, Experiment–theory comparison of TD-DFT/B3LYP/6-31G(d) singlet–triplet gap against experimental values, determined via the thermal activation energy method, measured in frozen toluene solution. The lines indicate a linear fit against the literature data (solid) with 95% confidence bounds (dashed).

  3. Effectiveness of machine learning.
    Figure 3: Effectiveness of machine learning.

    a, Fraction of molecules in the test set correctly ranked in the top 5%, as a function of the amount of training data. b, Root mean square error (RMSE) in log(kTADF) as a function of the amount of training data. c, Linear model predictions against the TD-DFT-derived data with the largest training set. R2 = 0.80. d, Neural network predictions against the TD-DFT-derived data with the largest training set. R2 = 0.94.

  4. Candidate statistics and voting tool.
    Figure 4: Candidate statistics and voting tool.

    a, Number of screened molecules as a function of singlet–triplet splitting (ΔEST) and oscillator strength (f). Contour lines represent estimated kTADF (μs−1) assuming S1 at 3.0eV. b, Number of screened molecules as a function of kTADF and S1 energy. Vertical dashed line corresponds to kTADF = 1μs−1. c, ΔEST and f for decision batches A–N, colour-coded by each of the generations in the candidate library. The box-and-whisker plots represent the statistical distribution of predictions for all the candidates in each batch. The bottom and top of the bar are the first and third quartiles, and the band inside the box is the median. The lines extending vertically from the boxes indicate the maximum and minimum of the range. d, Screenshot from the interactive web tool for molecular voting.

  5. Lead candidates and optoelectronic characterization.
    Figure 5: Lead candidates and optoelectronic characterization.

    a, Promising molecules that were synthesized and tested. Compound abbreviations are composed by the first letter of the batch of origin and a running index. b, Device structure and energy band diagram of lead candidates. Energies in eV. Thickness in nm. c, Electroluminescence spectra. d, External quantum efficiency as a function of current density. e, Current density and luminance as a function of applied voltage.


  1. Department of Chemistry and Chemical Biology, 12 Oxford Street, Harvard University, Cambridge, Massachusetts 02138, USA

    • Rafael Gómez-Bombarelli,
    • Jorge Aguilera-Iparraguirre,
    • Timothy D. Hirzel,
    • Martin A. Blood-Forsythe &
    • Alán Aspuru-Guzik
  2. John A. Paulson School of Engineering and Applied Sciences, 33 Oxford Street, Harvard University, Cambridge, Massachusetts 02138, USA

    • David Duvenaud,
    • Dougal Maclaurin &
    • Ryan P. Adams
  3. Samsung Research America, 255 Main Street, Suite 702, Cambridge, Massachusetts 02142, USA

    • Hyun Sik Chae &
    • Seong Ik Hong
  4. Department of Electrical Engineering and Computer Science, 77 Massachusetts Avenue, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Markus Einzinger,
    • Tony Wu &
    • Marc Baldo
  5. Department of Materials Science and Engineering, 77 Massachusetts Avenue, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Dong-Gwang Ha
  6. Department of Chemistry, 77 Massachusetts Avenue, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Georgios Markopoulos &
    • Wenliang Huang
  7. Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Korea

    • Soonok Jeon,
    • Hosuk Kang,
    • Hiroshi Miyazaki,
    • Masaki Numata &
    • Sunghan Kim


A.A.-G., M.B. and R.P.A. conceived the project. T.D.H. designed and wrote the custom computer code for molecular screening, with contributions from R.G.-B. and J.A.-I. R.G.-B. and J.A.-I. designed the molecules, with contributions from A.A.-G., H.M., M.N. and H.S.C. R.G.-B. and J.A.-I. performed calculations and analysed theoretical predictions. M.A.B.-F. carried out the experimental calibration of the theoretical methods. D.M., D.D. and R.P.A. applied machine learning to the computational predictions. H.S.C. and G.M. assessed synthetic feasibility of molecular candidates, with contributions from W.H., S.J., H.M., M.N. and S.K. R.G.-B., J.A.-I., T.D.H., H.S.C., M.A.B.-F., G.M., D.M., D.D., S.H., S.J., H.M., M.N., S.K., R.P.A., M.B. and A.A.-G. selected the molecules for characterization. S.J. synthesized J1-2 and L1. S.J., H.S.C., T.W., D.-G.H. and M.E. collected and analysed spectroscopic data. D.-G.H., M.E. and T.W. manufactured and tested devices for F1, J1, J2 and L1, with contributions from H.K. R.G.-B., J.A.-I. and T.D.H. wrote the first version of the manuscript. All authors contributed to the discussion, writing and editing of the manuscript. A.A.-G. and R.P.A. supervised the computational chemistry study. R.P.A. and A.A.-G. supervised the machine learning approach. M.B. supervised the device fabrication.

Competing financial interests

The authors declare no competing financial interests.

