Glass transition temperature prediction of disordered molecular solids

Glass transition temperature, Tg, is the key quantity for assessing morphological stability and molecular ordering of films of organic semiconductors. A reliable prediction of Tg from the chemical structure is, however, challenging, as it is sensitive to both molecular interactions and analysis of the heating or cooling process. By combining a fitting protocol with an automated workflow for forcefield parameterization, we predict Tg with a mean absolute error of ~20 °C for a set of organic compounds with Tg in the 50–230 °C range. Our study establishes a reliable and automated prescreening procedure for the design of amorphous organic semiconductors, essential for the optimization and development of organic light-emitting diodes.


INTRODUCTION
Amorphous molecular solids are materials composed of organic molecules with a disordered molecular packing. They are widely used in electronic and opto-electronic applications due to their flexibility, light-weight, solution-processability and tunability through versatile organic synthesis [1][2][3][4][5] . For example, they serve as charge carrier transport materials 6,7 and host materials 8,9 in organic light-emitting diodes (OLEDs) and perovskite solar cells.
In addition to the electronic properties of these materials, the glass transition temperature, T g , is especially important as it determines the morphological stability and hence the lifetime of electronic devices 10 . The search for high-T g compounds is therefore crucial for all electronic applications. In this context, computer simulation protocols capable of reliable T g -prediction are invaluable. Various efforts have been made in the past to achieve accurate T g -prediction using statistical approaches, such as quantitative structure-property relationships [11][12][13][14] and machine learning [15][16][17] , as well as the direct extraction from molecular dynamics (MD) simulations [18][19][20][21] . In the latter approach, reliable-T g prediction has been achieved for a particular material through tailor-made forcefield parameters 22 . A systematic study of a dataset of conjugated small molecules using a deterministic approach is, however, lacking. This is presumably due to (1) the considerable computational cost and human effort involved in forcefield parameterization and molecular dynamics simulations and (2) the difficulty of finding a forcefield parameterization protocol that works for a dataset of molecules composed of different molecular building blocks.
In our recent work 23,24 , we developed a multiscale computational protocol that successfully predicts relevant physical properties of OLED host materials, including solid-state ionization energy, density of states of charge carriers and the charge carrier mobility. Despite this success, the predicted T g values are significantly off the experimental values. Since all torsional parameters were reparametrized with Lennard-Jones parameters from the OPLS database and Merz-Singh-Kollman partial charges 25,26 , we conclude that the non-bonded terms 24,27,28 should also be adjusted.
Here, we propose a computational methodology for reliable T gprediction of disordered organic molecular solids involving two parts: (1) fitting protocol for a density-temperature plot and (2) non-bonded forcefield parameterization via atoms-in-molecule electron density partitioning 29,30 . This methodology is tested on common organic semiconducting OLED hosts and holetransporting materials, the chemical structures of which are shown in Fig. 1. The computed T g values are in good agreement with the experimental T g from our measurements and literature [31][32][33][34][35][36][37][38][39] , with a mean absolute error (MAE) of 20.5°C, excluding two outliers.

Conventional bilinear fit
Conventionally, the simulated T g is obtained via a bilinear fit, as an intersection of two lines in a density-temperature (ρ-T) plot, as shown in Fig. 2a. Ideally, two plateaus representing two linear regions should be observed in a slope-temperature plot. However, this is often not the case for data extracted from MD trajectories, as shown in Fig. 2b and Supplementary Fig. 1. For BCP, for example, a transition region II between regions I and III could be identified based on the change in the slope, as shown in Fig. 2b. The non-constant slope observed in Region I and III and the existence of Region IV may be because there is more than one mechanism responsible for the volume compression during the cooling step, and the cooling rate (100 K ns −1 ) adopted in the simulations is too high. However, this non-ideal behavior persists even in the simulation with a 10 times slower cooling rate (10 K ns −1 , see Supplementary Fig. 2). Since MD simulations with a cooling rate of 10 K ns −1 and all-atom forcefields for 3000-molecule system (150,000-450,000 atoms) is already too costly for the computational screening, we aim at extracting T g from a non-ideal ρ-T plot with a 100 K ns −1 cooling rate.
The ambiguity in choosing the fitting ranges for Region I and Region III leads to a human bias, which is also common in the fitting process of charge carrier mobility 40 . In Fig. 2b, for example, we could choose any fitting range lying between point A and point B for Region I. Similarly, we could choose any fitting range between point C and point D for Region III. To quantify this variation, we show T g calculated using four fitting ranges for Region I (A, B) and Region III (C, D) in Fig. 2c. For BCP this leads to variations from 51.2°C to 147.7°C, depending on the fitting ranges.
Determination of fitting ranges from a R 2 -T plot To explore the best linear fitting ranges for Region I and III, we plotted the R 2 vs temperature (R 2 -T plot) with different sizes of the fitting range for BCP, as shown in Fig. 3. Starting from the range of

K, a clear valley region (Region II) that separates Region I from
III can be observed. The decrease of R 2 in the valley region is a direct consequence of including a more non-linear transition region into the linear regression. A straightforward way of determining two optimal fitting ranges is then to pick two "hill tops", corresponding to the maximum R 2 values, on the opposite sides of the valley. This protocol is generic, as is illustrated for all compounds in Supplementary Fig. 3, indicating that the R 2 -T plot does provide fitting ranges unambiguously.
The effect of the fitting ranges on T g is shown in Supplementary  Fig. 4. Overall, the variation of T g due to the fitting range is much smaller than that observed in Fig. 2c. In most cases, the calculated T g slightly rises with increasing fitting range. In some cases; however, the fitted T g values exhibit complicated dependence on the fitting range (such as BTST and MTDATA). This originates from the synergistic effect of the aforementioned non-ideal ρ-T curve and the density fluctuation (see Supplementary Discussion for more details). This indicates that we cannot determine the optimal fitting range for each compound unbiasedly. Nevertheless, we can treat the fitting range as a parameter to be determined. Overall, the T g calculated with the fitting ranges of 150 K and 200 K show better agreement with experimental values, as shown in Supplementary Fig. 4. Therefore, the fitting range of 200 K will be used for later comparison. The ρ-T plots and calculated T g of the 26 compounds determined using the 200 K fitting range are shown in Supplementary Fig. 5.

Effect of VdW parameters
With the proposed fitting procedure at hand, we now study the impact of the forcefield on the predicted glass transition temperature. It has been shown that electronic properties, such as ionization energy, density of states and charge carrier mobility, of amorphous films simulated using the Merz-Singh-Kollman partial charges and OPLS VdW parameters are in good agreement with experimental values 17 . However, the T g simulated using this forcefield, poorly agree with experiments, as shown in Fig. 4a and Supplementary Table 1. Assuming that the disagreement is due to non-bonded parameters, we reparametrized all non-bonded parameters for each molecule employing electron density-based DDEC6 method 18,19 . The predictions are also summarized in Fig.  4a. Compared to the MAE of 59.1°C using the MK-based forcefield, DDEC6 gives a smaller MAE of 20.5°C and a much higher R 2 (0.61 vs 0.87). The reported MAE excludes T g of MTDATA and 2-TNATA, which are clear outliers for both methods (see Supplementary  Table 1), and have a similar molecular structure, namely starburst dendrimers of triphenylamine (TPA) 41 . Since these two compounds are mostly composed of TPA blocks, the reason for the observed discrepancy may be the lack of non-local correlations in dihedral potentials of the TPA moiety, absent in both forcefields 42 . To correctly describe this behavior, an improved forcefield model that takes this correlation into account, either by a traditional fitting or a machine-learning approach 39 , is required.
In addition, we compare our results with those obtained from the protocol proposed by Patrone et al., which also features a reduction of human analysis variation 43 . The T g is determined as the intersection of high-and the low-temperature asymptotes of a hyperbolic fit of the whole range of the data. As shown in Fig. 4b, T g calculated using Patrone's protocol gives reasonably high R 2 and a slope close to one. If the transferability of the fitted linear model is good, we can also predict T g using this model by adding the intercept on top of the simulated T g values. Overall, our protocol performs better in terms of both MAE, 64.7 vs 20.5°C and R 2 , 0.75 vs 0.87.

Efficiency of our computational protocol
Since our goal is to set up an automated prescreening procedure, we have analyzed the average computational cost and human effort required for obtaining the T g for one compound. The specification of CPUs used for each computation is listed in the Methods section. Out of 6 h of human effort per compound, ca 70% is spent on generating the molecular topology and parameterizing bonded and non-bonded parameters. This motivates us to further develop an automatic/semi-automatic generator for these files. As for the computational cost, the total time spent on computing the T g using the DDEC protocol amounts tõ 850,000 cpu hours, as shown in Fig. 5. Around 50% of the resources are used in compressing and heating the as-generated low-density morphology. Nearly 30 and 20% of cpu hours are utilized for the MD cooling step and QM constrained dihedral scanning, respectively. The step for computing DDEC charge and LJ parameters is only 0.1% of the total computational cost. It is clear that further optimization of the protocol to achieve better efficiency is necessary for a large-scale screening purpose. One potential direction is to reduce the computational cost of compressing and heating in MD simulations by using a smaller system and denser initial morphology.

DISCUSSION
In summary, we propose a computational methodology for reliable T g prediction, including a fitting procedure and an MD simulation protocol using DDEC6 non-bonded parameters. Our fitting procedure reduces the human analysis variation (~90°C for BCP) resulting from different chosen fitting ranges for linear regression. In combination, the proposed fitting procedure ensures the reliability of T g , while the DDEC6 forcefield parameters improve T g predictions. The obtained values agree well with experimental values, with MAE = 20.5°C, acceptable for in silico prescreening of organic hosts. The overestimation of T g for two starburst TPA-based dendrimers encourages further forcefield development, as the TPA moiety is ubiquitous in organic semiconductors 44 . Furthermore, we generated OPLS-DDEC forcefield parameters for 26 compounds, which are available in the open-access git repository. These forcefield parameters, which give reasonable T g , could also be useful for multiscale simulations of complicated guest-host or interfacial systems in OLEDs and in silico deposition protocols. Finally, given that the molecular fragments included in this molecular dataset are the most common building blocks of organic semiconductors, we expect our protocol could have wide applicability in this field.

EXPERIMENTAL SECTION
For determination of the glass transition temperature (T g ) at Merck KGaA, Darmstadt, Germany, we used differential scanning calorimetry (DSC) analyzing samples of 10-15 mg in DSC 204/1/ G Phönix from Netsch. Samples were heated by 5°C min −1 up to 370°C then cooled by 20°C min −1 to 0°C and finally heated again by 20°C min −1 to 370°C, where T g was determined by the kink in heat flow vs temperature using the temperature corresponding to half the drop in heat flux. Only for BCP and TMBT, this protocol did not yield a significant kink. TMBT was expected to be the lowest T g material from simulation, so we tried other protocols to measure T g . We finally used a 5 mg TMBT sample in DSC Discovery from TA Instruments in nitrogen atmosphere and first heated by 20°C min −1 up to 320°C, then for cooling quenched the sample by liquid nitrogen and finally heated by 20°C min −1 up to 320°C, where the T g was observed. Other protocols we tried for TMBT without the cooling quench did not lead to the observation of T g .

Computational section
Our protocol for T g prediction involves three steps: (i) forcefield parameterization, (ii) classical MD simulations and (iii) fitting procedure, which are described as following.
(i) Forcefield parameterization: All bonded parameters apart from proper and improper dihedrals were taken from OPLS-AA forcefield 45,46 and our previous work 24 . The non-bonded parameters, atomic partial charges and Lennard-Jones parameters, were derived following the protocol proposed by Cole et al. 29 . In short, the overlapping atomic electron densities were obtained via the density-derived electrostatic and chemical (DDEC6) electron density partitioning scheme 30 . The atomic partial charges can then be obtained by integrating the corresponding atomic electron densities over the whole space. Additionally, the two parameters, A and B, in Lennard-Jones potential are then derived using Tkatchenko−Scheffler (TS) scheme 47 , where the radius of the free atom in a vacuum ðR free i Þ is taken from ref. 29 . The electron density was obtained using Gaussian16 48 at ωB97X-D 32 /6-311 G(d,p) level and the DDEC6 computations were performed using Chargemol of version 09_26_2017 30 . All molecules considered in this work were partitioned into several rigid fragments following the same procedure as our previous work 24 . After non-bonded parameters were set, the dihedral potentials that connect these rigid fragments, which are usually missing in the OPLS-AA database, were parameterized using the constrained optimization scanning performed at ωB97X-D3/def2-TZVP level using ORCA 4.2.1 49 . For more details of the parameterization of dihedral potentials, please refer to ref. 50 .
(ii) Classical MD simulations: All classical MD simulations were performed using GROMACS of version 2020.3 51,52 . For the long-range electrostatic interactions, the particle mesh Ewald (PME) method was employed with a 0.12 nm Fourier spacing. A cutoff of 13 Å was applied to all non-bonded interactions. The temperature and pressure control were accomplished using velocity rescaling with a stochastic term 53 (τ T = 0.5 ps) and an isotropic coupling for the pressure from a Berendsen barostat (P 0 = 1 bar, χ = 4.5 × 10 −5 bar −1 , and τ P = 0.5 ps).
For each compound, 3000 molecules were initially randomly placed in a simulation box with a low target density around 50-150 kg m −3 using Packmol 54 . The whole system was then heated up from 100 K to 300 K at a rate of 0.67 K ps −1 . It was then equilibrated at 300 K until the density reached a steady value. This step helps to prevent the system from exploding due to a high heating rate. Finally, the system was heated up from 300 K to 800 K at a rate of 0.5 K ps −1 , followed by an equilibration at  800 K for 10 ns. The equilibration time is long enough to ensure a steady density of the system for all compounds discussed here. Finally, the system underwent a linear cooling procedure from 800 K to 0 K at a 100 K ns −1 cooling rate, where the density and temperature data were extracted for T g fitting procedure.
The effect of cooling protocol (continuous vs stepwise) was also investigated. The cooling profiles for the stepwise cooling are shown in Supplementary Fig. 6. The density-temperature curves of mCP obtained using different conditions are very similar, as shown in Supplementary Fig.  7. For this reason, we put our main focus on other factors, such as nonbonded parameters and the fitting protocol. All T g values shown in the main text are extracted from MD simulations with a continuous cooling protocol.
Furthermore, the effects of system size, initial configurations, barostats, cooling rates and density fluctuation were investigated for mCP (see Supplementary Discussion). We also evaluated our R 2 -based fitting protocol by considering the effect of the ideal/non-ideal ρ-T curve and the density fluctuation on our protocol. To conclude, our current computational workflow is useful for practical T g prediction, reaching a balance of accuracy and the computational cost.
(iii) Fitting protocol: The simulated T g is obtained via a bilinear fit, as an intersection of two lines in a density-temperature (ρ−T) plot. In order to obtain the optimal fitting range in a less biased way, we propose the following fitting strategy. We plotted the R 2 value as a function of temperature T for a series of linear regressions with fitting ranges [T, T+a], where a is the fitting size (Fig. 3). In this plot, we can identify a clear valley region (Region II) that separates Region I from III (Fig. 2). The fitting ranges for two linear regressions are chosen as the two "hill tops" (maximum R 2 value) on the opposite sides of the valley region. The effect of the fitting size is described in detail in the main text. The MK data shown in Fig. 1b are taken from our previous work 24 , where the atomic charges are derived using Merz-Singh-Kollman scheme and VdW parameters are taken from OPLS forcefield database.

DATA AVAILABILITY
The input files of simulations, optimized forcefield parameters, the densitytemperature data for all compounds are available in the Tg git repository, version 1.0: https://gitlab.mpcdf.mpg.de/materials/tg.