Introduction

Amorphous molecular solids are materials composed of organic molecules with a disordered molecular packing. They are widely used in electronic and opto-electronic applications due to their flexibility, light-weight, solution-processability and tunability through versatile organic synthesis1,2,3,4,5. For example, they serve as charge carrier transport materials6,7 and host materials8,9 in organic light-emitting diodes (OLEDs) and perovskite solar cells.

In addition to the electronic properties of these materials, the glass transition temperature, Tg, is especially important as it determines the morphological stability and hence the lifetime of electronic devices10. The search for high-Tg compounds is therefore crucial for all electronic applications. In this context, computer simulation protocols capable of reliable Tg-prediction are invaluable. Various efforts have been made in the past to achieve accurate Tg-prediction using statistical approaches, such as quantitative structure-property relationships11,12,13,14 and machine learning15,16,17, as well as the direct extraction from molecular dynamics (MD) simulations18,19,20,21. In the latter approach, reliable-Tg prediction has been achieved for a particular material through tailor-made forcefield parameters22. A systematic study of a dataset of conjugated small molecules using a deterministic approach is, however, lacking. This is presumably due to (1) the considerable computational cost and human effort involved in forcefield parameterization and molecular dynamics simulations and (2) the difficulty of finding a forcefield parameterization protocol that works for a dataset of molecules composed of different molecular building blocks.

In our recent work23,24, we developed a multiscale computational protocol that successfully predicts relevant physical properties of OLED host materials, including solid-state ionization energy, density of states of charge carriers and the charge carrier mobility. Despite this success, the predicted Tg values are significantly off the experimental values. Since all torsional parameters were reparametrized with Lennard–Jones parameters from the OPLS database and Merz–Singh–Kollman partial charges25,26, we conclude that the non-bonded terms24,27,28 should also be adjusted.

Here, we propose a computational methodology for reliable Tg-prediction of disordered organic molecular solids involving two parts: (1) fitting protocol for a density–temperature plot and (2) non-bonded forcefield parameterization via atoms-in-molecule electron density partitioning29,30. This methodology is tested on common organic semiconducting OLED hosts and hole-transporting materials, the chemical structures of which are shown in Fig. 1. The computed Tg values are in good agreement with the experimental Tg from our measurements and literature31,32,33,34,35,36,37,38,39, with a mean absolute error (MAE) of 20.5 °C, excluding two outliers.

Fig. 1: Molecular structures of organic molecular solids.
figure 1

Chemical structures of 26 OLED host materials and charge transport materials investigated in this work.

Results

Conventional bilinear fit

Conventionally, the simulated Tg is obtained via a bilinear fit, as an intersection of two lines in a density-temperature (ρ–T) plot, as shown in Fig. 2a. Ideally, two plateaus representing two linear regions should be observed in a slope–temperature plot. However, this is often not the case for data extracted from MD trajectories, as shown in Fig. 2b and Supplementary Fig. 1. For BCP, for example, a transition region II between regions I and III could be identified based on the change in the slope, as shown in Fig. 2b. The non-constant slope observed in Region I and III and the existence of Region IV may be because there is more than one mechanism responsible for the volume compression during the cooling step, and the cooling rate (100 K ns−1) adopted in the simulations is too high. However, this non-ideal behavior persists even in the simulation with a 10 times slower cooling rate (10 K ns−1, see Supplementary Fig. 2). Since MD simulations with a cooling rate of 10 K ns−1 and all-atom forcefields for 3000-molecule system (150,000–450,000 atoms) is already too costly for the computational screening, we aim at extracting Tg from a non-ideal ρ–T plot with a 100 K ns−1 cooling rate.

Fig. 2: Derivation of Tg via a bilinear fit.
figure 2

a A schematic illustration of a density–temperature and a slope–temperature plot characterizing a glass transition process. b Slope–temperature plot of BCP molecule. Each blue point on the plot represents a slope from a linear regression within the temperature range of [T, T + 150]. Four regions and corresponding edge points, A, B, C, D, are specified. c Fitted Tg using four different combinations of fitting ranges.

The ambiguity in choosing the fitting ranges for Region I and Region III leads to a human bias, which is also common in the fitting process of charge carrier mobility40. In Fig. 2b, for example, we could choose any fitting range lying between point A and point B for Region I. Similarly, we could choose any fitting range between point C and point D for Region III. To quantify this variation, we show Tg calculated using four fitting ranges for Region I (A, B) and Region III (C, D) in Fig. 2c. For BCP this leads to variations from 51.2 °C to 147.7 °C, depending on the fitting ranges.

Determination of fitting ranges from a R 2 –T plot

To explore the best linear fitting ranges for Region I and III, we plotted the R2 vs temperature (R2–T plot) with different sizes of the fitting range for BCP, as shown in Fig. 3. Starting from the range of 150 K, a clear valley region (Region II) that separates Region I from III can be observed. The decrease of R2 in the valley region is a direct consequence of including a more non-linear transition region into the linear regression. A straightforward way of determining two optimal fitting ranges is then to pick two “hill tops”, corresponding to the maximum R2 values, on the opposite sides of the valley. This protocol is generic, as is illustrated for all compounds in Supplementary Fig. 3, indicating that the R2–T plot does provide fitting ranges unambiguously.

Fig. 3: The effect of fitting ranges on R2–T plot.
figure 3

The R2–T plot for BCP compound with six different fitting ranges. Each blue point in the plot represents an R2 value fitted with a range [T, T+a], where a = 50, 100, 150, 200, 250, and 300 K.

The effect of the fitting ranges on Tg is shown in Supplementary Fig. 4. Overall, the variation of Tg due to the fitting range is much smaller than that observed in Fig. 2c. In most cases, the calculated Tg slightly rises with increasing fitting range. In some cases; however, the fitted Tg values exhibit complicated dependence on the fitting range (such as BTST and MTDATA). This originates from the synergistic effect of the aforementioned non-ideal ρ–T curve and the density fluctuation (see Supplementary Discussion for more details). This indicates that we cannot determine the optimal fitting range for each compound unbiasedly. Nevertheless, we can treat the fitting range as a parameter to be determined. Overall, the Tg calculated with the fitting ranges of 150 K and 200 K show better agreement with experimental values, as shown in Supplementary Fig. 4. Therefore, the fitting range of 200 K will be used for later comparison. The ρ–T plots and calculated Tg of the 26 compounds determined using the 200 K fitting range are shown in Supplementary Fig. 5.

Effect of VdW parameters

With the proposed fitting procedure at hand, we now study the impact of the forcefield on the predicted glass transition temperature. It has been shown that electronic properties, such as ionization energy, density of states and charge carrier mobility, of amorphous films simulated using the Merz–Singh–Kollman partial charges and OPLS VdW parameters are in good agreement with experimental values17. However, the Tg simulated using this forcefield, poorly agree with experiments, as shown in Fig. 4a and Supplementary Table 1. Assuming that the disagreement is due to non-bonded parameters, we reparametrized all non-bonded parameters for each molecule employing electron density-based DDEC6 method18,19. The predictions are also summarized in Fig. 4a. Compared to the MAE of 59.1 °C using the MK-based forcefield, DDEC6 gives a smaller MAE of 20.5 °C and a much higher R2 (0.61 vs 0.87). The reported MAE excludes Tg of MTDATA and 2-TNATA, which are clear outliers for both methods (see Supplementary Table 1), and have a similar molecular structure, namely starburst dendrimers of triphenylamine (TPA)41. Since these two compounds are mostly composed of TPA blocks, the reason for the observed discrepancy may be the lack of non-local correlations in dihedral potentials of the TPA moiety, absent in both forcefields42. To correctly describe this behavior, an improved forcefield model that takes this correlation into account, either by a traditional fitting or a machine-learning approach39, is required.

Fig. 4: Comparison between experimental and computational Tg values.
figure 4

The experimental and simulated Tg of 24 compounds (removing outliers MTDATA and 2-TNATA). a Simulated Tg calculated using different MD data (OPLS-MK vs DDEC6; see Methods) and the same fitting protocol (fitting size of 200 K). b Simulated Tg calculated using the same MD data (DDEC6) and different fitting protocols (Patrone et al. vs this work).

In addition, we compare our results with those obtained from the protocol proposed by Patrone et al., which also features a reduction of human analysis variation43. The Tg is determined as the intersection of high- and the low-temperature asymptotes of a hyperbolic fit of the whole range of the data. As shown in Fig. 4b, Tg calculated using Patrone’s protocol gives reasonably high R2 and a slope close to one. If the transferability of the fitted linear model is good, we can also predict Tg using this model by adding the intercept on top of the simulated Tg values. Overall, our protocol performs better in terms of both MAE, 64.7 vs 20.5 °C and R2, 0.75 vs 0.87.

Efficiency of our computational protocol

Since our goal is to set up an automated prescreening procedure, we have analyzed the average computational cost and human effort required for obtaining the Tg for one compound. The specification of CPUs used for each computation is listed in the Methods section. Out of 6 h of human effort per compound, ca 70% is spent on generating the molecular topology and parameterizing bonded and non-bonded parameters. This motivates us to further develop an automatic/semi-automatic generator for these files. As for the computational cost, the total time spent on computing the Tg using the DDEC protocol amounts to ~850,000 cpu hours, as shown in Fig. 5. Around 50% of the resources are used in compressing and heating the as-generated low-density morphology. Nearly 30 and 20% of cpu hours are utilized for the MD cooling step and QM constrained dihedral scanning, respectively. The step for computing DDEC charge and LJ parameters is only 0.1% of the total computational cost. It is clear that further optimization of the protocol to achieve better efficiency is necessary for a large-scale screening purpose. One potential direction is to reduce the computational cost of compressing and heating in MD simulations by using a smaller system and denser initial morphology.

Fig. 5: Summary of the computational cost of our Tg protocol.
figure 5

Pie chart of the computational costs (in terms of cpu hours) of different steps summed over all compounds in our protocol.

Discussion

In summary, we propose a computational methodology for reliable Tg prediction, including a fitting procedure and an MD simulation protocol using DDEC6 non-bonded parameters. Our fitting procedure reduces the human analysis variation (~90 °C for BCP) resulting from different chosen fitting ranges for linear regression. In combination, the proposed fitting procedure ensures the reliability of Tg, while the DDEC6 forcefield parameters improve Tg predictions. The obtained values agree well with experimental values, with MAE = 20.5 °C, acceptable for in silico prescreening of organic hosts. The overestimation of Tg for two starburst TPA-based dendrimers encourages further forcefield development, as the TPA moiety is ubiquitous in organic semiconductors44. Furthermore, we generated OPLS-DDEC forcefield parameters for 26 compounds, which are available in the open-access git repository. These forcefield parameters, which give reasonable Tg, could also be useful for multiscale simulations of complicated guest-host or interfacial systems in OLEDs and in silico deposition protocols. Finally, given that the molecular fragments included in this molecular dataset are the most common building blocks of organic semiconductors, we expect our protocol could have wide applicability in this field.

Experimental section

For determination of the glass transition temperature (Tg) at Merck KGaA, Darmstadt, Germany, we used differential scanning calorimetry (DSC) analyzing samples of 10–15 mg in DSC 204/1/G Phönix from Netsch. Samples were heated by 5 °C min−1 up to 370 °C then cooled by 20 °C min−1 to 0 °C and finally heated again by 20 °C min−1 to 370 °C, where Tg was determined by the kink in heat flow vs temperature using the temperature corresponding to half the drop in heat flux. Only for BCP and TMBT, this protocol did not yield a significant kink. TMBT was expected to be the lowest Tg material from simulation, so we tried other protocols to measure Tg. We finally used a 5 mg TMBT sample in DSC Discovery from TA Instruments in nitrogen atmosphere and first heated by 20 °C min−1 up to 320 °C, then for cooling quenched the sample by liquid nitrogen and finally heated by 20 °C min−1 up to 320 °C, where the Tg was observed. Other protocols we tried for TMBT without the cooling quench did not lead to the observation of Tg.

Methods

Computational section

Our protocol for Tg prediction involves three steps: (i) forcefield parameterization, (ii) classical MD simulations and (iii) fitting procedure, which are described as following.

(i) Forcefield parameterization: All bonded parameters apart from proper and improper dihedrals were taken from OPLS-AA forcefield45,46 and our previous work24. The non-bonded parameters, atomic partial charges and Lennard-Jones parameters, were derived following the protocol proposed by Cole et al.29. In short, the overlapping atomic electron densities were obtained via the density-derived electrostatic and chemical (DDEC6) electron density partitioning scheme30. The atomic partial charges can then be obtained by integrating the corresponding atomic electron densities over the whole space. Additionally, the two parameters, A and B, in Lennard-Jones potential are then derived using Tkatchenko−Scheffler (TS) scheme47, where the radius of the free atom in a vacuum \(({{{{\mathrm{R}}}}_{{{\mathrm{i}}}}^{{{\mathrm{free}}}})}\) is taken from ref. 29. The electron density was obtained using Gaussian1648 at ωB97X-D32/6–311 G(d,p) level and the DDEC6 computations were performed using Chargemol of version 09_26_201730. All molecules considered in this work were partitioned into several rigid fragments following the same procedure as our previous work24. After non-bonded parameters were set, the dihedral potentials that connect these rigid fragments, which are usually missing in the OPLS-AA database, were parameterized using the constrained optimization scanning performed at ωB97X-D3/def2-TZVP level using ORCA 4.2.149. For more details of the parameterization of dihedral potentials, please refer to ref. 50.

(ii) Classical MD simulations: All classical MD simulations were performed using GROMACS of version 2020.351,52. For the long-range electrostatic interactions, the particle mesh Ewald (PME) method was employed with a 0.12 nm Fourier spacing. A cutoff of 13 Å was applied to all non-bonded interactions. The temperature and pressure control were accomplished using velocity rescaling with a stochastic term53T = 0.5 ps) and an isotropic coupling for the pressure from a Berendsen barostat (P0 = 1 bar, χ = 4.5 × 10−5 bar−1, and τP = 0.5 ps).

For each compound, 3000 molecules were initially randomly placed in a simulation box with a low target density around 50–150 kg m−3 using Packmol54. The whole system was then heated up from 100 K to 300 K at a rate of 0.67 K ps−1. It was then equilibrated at 300 K until the density reached a steady value. This step helps to prevent the system from exploding due to a high heating rate. Finally, the system was heated up from 300 K to 800 K at a rate of 0.5 K ps−1, followed by an equilibration at 800 K for 10 ns. The equilibration time is long enough to ensure a steady density of the system for all compounds discussed here. Finally, the system underwent a linear cooling procedure from 800 K to 0 K at a 100 K ns−1 cooling rate, where the density and temperature data were extracted for Tg fitting procedure.

The effect of cooling protocol (continuous vs stepwise) was also investigated. The cooling profiles for the stepwise cooling are shown in Supplementary Fig. 6. The density-temperature curves of mCP obtained using different conditions are very similar, as shown in Supplementary Fig. 7. For this reason, we put our main focus on other factors, such as non-bonded parameters and the fitting protocol. All Tg values shown in the main text are extracted from MD simulations with a continuous cooling protocol.

Furthermore, the effects of system size, initial configurations, barostats, cooling rates and density fluctuation were investigated for mCP (see Supplementary Discussion). We also evaluated our R2-based fitting protocol by considering the effect of the ideal/non-ideal ρ–T curve and the density fluctuation on our protocol. To conclude, our current computational workflow is useful for practical Tg prediction, reaching a balance of accuracy and the computational cost.

(iii) Fitting protocol: The simulated Tg is obtained via a bilinear fit, as an intersection of two lines in a density–temperature (ρ−T) plot. In order to obtain the optimal fitting range in a less biased way, we propose the following fitting strategy. We plotted the R2 value as a function of temperature T for a series of linear regressions with fitting ranges [T, T+a], where a is the fitting size (Fig. 3). In this plot, we can identify a clear valley region (Region II) that separates Region I from III (Fig. 2). The fitting ranges for two linear regressions are chosen as the two “hill tops” (maximum R2 value) on the opposite sides of the valley region. The effect of the fitting size is described in detail in the main text. The MK data shown in Fig. 1b are taken from our previous work24, where the atomic charges are derived using Merz–Singh–Kollman scheme and VdW parameters are taken from OPLS forcefield database.

Finally, all QM computations (GAUSSIAN16 and ORCA) were performed using 20 CPUs of Intel Xeon Silver 4210 @ 2.2 GHz. All MD simulations were performed using (4 nodes; 4*40 = 160CPUs) of Intel Skylake 6148 @ 2.4 GHz.