Introduction

Quantum mechanical methods with high energy accuracy, such as density functional theory (DFT), can optimize molecular input structures to a nearby local minimum, but calculating accurate reaction thermodynamics requires finding global minimum energy structures1,2. For simple molecules, expert intuition can identify a few minima to focus study on, but an alternative approach must be considered for more complex molecules or to eventually fulfil the dream of autonomous catalyst design3,4: the potential energy surface must be first surveyed with a computationally efficient method; then minima from this survey must be refined using slower, more accurate methods; finally, for molecules possessing low-frequency vibrational modes, those modes need to be treated appropriately to obtain accurate thermodynamic energies5,6,7. This multistep process has a prohibitively steep learning curve for many newcomers, and trained researchers spend significant time monitoring calculations and transferring data from one phase to the next. Various programs have been previously written to automate the workflow between computational chemistry engines8,9,10,11,12,13,14,15. We have constructed our own Python script, XTBDFT, to automate the workflow between (1) GFN-xTB-driven16 CREST2,12,17, an accurate and efficient meta-dynamics method for conformational analysis for systems, particularly those with transition-metal atoms and exotic functional groups, (2) conformer refinement with DFT, as implemented in NWChem18, and (3) GoodVibes, a script to apply quasi-harmonic treatment of low-frequency vibrational modes6 (Fig. 1). Notably all components of this automated workflow are open-source, allowing for widespread and affordable implementation. Herein we apply XTBDFT toward a practical topic: computational assessment of diphosphinoamine (PNP) ligand candidates.

Figure 1
figure 1

XTBDFT flowchart with example input and output.

PNP ligands are extensively studied for Cr-catalyzed ethylene oligomerisation to valuable linear alpha olefins19,20,21,22. Recent academic23,24,25 and industrial26 studies continue to show the prominence of PNP ligands. However, reactor fouling, caused by insoluble polyethylene by-product, remains a major impediment to industrial-scale practice. PNP ligands can isomerize to iminobisphosphines (PPN) under catalytic conditions, which has been proposed to lead to increased polyethylene formation24. We sought to evaluate that hypothesis by calculating thermodynamic stabilities of PNPs against isomerization to PPN form (ΔGPPN, Fig. 2). Both PNP and PPN molecules possess many conformations, and analysis of the wrong conformer will result in incorrect ΔGPPN. A previous DFT study of PNP ligand conformers found variation of up to 9.9 kcal/mol in calculated Gibbs free energy25, which would drastically impact any thermodynamic comparison with PPN isomer.

Figure 2
figure 2

Diphosphinoamine (PNP) to iminobisphosphine (PPN) isomerization and illustrative potential energy surfaces.

In this report, we utilize XTBDFT to automate global optimization and calculation of ΔGPPN of several known PNP/PPN compounds (1–33, Fig. 3), and observe strong agreement with experimental observations. Then a strong inverse relation between PNP stability and polyethylene formation during ethylene oligomerization catalysis is observed. Finally, this method is applied to screen novel PNP ligand candidates, saving significant time by ruling out candidates with non-trivial synthetic routes and poor expected catalytic performance.

Figure 3
figure 3

PNP ligands 133 or PPN isomers (1′33′, not shown) that have been reported in the literature. Black structures are calculated to have positive ΔGPPN; red structures have negative ΔGPPN.

Methods

Initial guess geometries for PNP and PPN molecules were conveniently generated with MolView27. These molecules have numerous hindered degrees of rotation, and standard geometry optimization algorithms may yield a relatively high-energy conformer. A previous DFT study of relatively simple (i.e. high substitutional symmetry) PNP molecules found that geometric local minima could differ by 9.9 kcal/mol in Gibbs free energy25. Human-directed generation of conformer ensembles for computational analysis is time-intensive and inconsistent. To ensure identification of a low-energy conformer, an efficient computational method must be utilized to generate and crudely rank a broad conformer ensemble. Conformer sampling algorithms employed in Tinker28, OpenBabel29, and various other molecular modelling programs rely on heavily parameterized classical force fields or semiempirical methods that may not accurately reproduce geometries or energies of exotic molecules or transition-metal complexes, and in some cases researchers have to resort to substituting unparameterized elements with more common elements30. We instead use the meta-dynamics package Conformer Rotamer Ensemble Sampling Tool (CREST2,17) driven by the semi-empirical density functional tight binding theory GFN2-xTB, which, unlike previously developed semi-empirical methods, has been broadly parameterized for all elements H-Rd and benchmarked against diverse chemical databases16, to automate generation of an ensemble of conformers within 2.0 kcal/mol of the minimum energy conformer. As an added benefit of using CREST for conformation searching, its built-in conformer symmetry analysis identifies rotamers that are chemically identical, greatly reducing the size of the conformer ensemble to be processed by higher levels of theory. In one example, CREST identified 35 unique conformers of (Ph2P)2NPh (9) within 6 kcal/mol of the lowest energy conformer. For comparison, the conformer searching procedure built into Spartan ‘18 (powered by Merck Molecular Force Field31) returns 225 conformers.

Further geometry optimizations of the conformer ensemble were performed with density functional theory (DFT) as implemented in NWChem18,32 (version 6.8) with the B3LYP functional33,34, def2-SV(P) basis set35, Weigend Coulomb-fitting auxiliary basis set36, Grimme DFT-D3 dispersion corrections37, NWChem medium integration grid, and loose geometric convergence criteria (using NWChem input parameters of gmax 0.002; grms 0.0003; xrms 1; xmax 1). The lowest energy conformer at this intermediate level of theory was then further geometry optimized with tighter convergence criteria (gmax 0.0001; grms 0.00003; xrms 0.0006; xmax 0.001), and thermochemical corrections were calculated, with the def2-SVP basis set35 and the NWChem fine integration grid. Finally, a high-level single-point electronic energy evaluation was performed using the def2-TZVP basis set35.

Because of the hindered degrees of rotation, several optimized geometries possess low-magnitude frequency vibrations, which are inaccurately treated by the harmonic oscillator approximation for thermodynamics. A correction for these vibrations was applied by quasi-harmonically raising all frequencies below 100 cm−1 using the GoodVibes script6. Goodvibes code was modified to allow parsing of NWChem output files, and this customization has been merged into the publicly available Goodvibes release.

Taking advantage of the programmable nature of CREST, NWChem, and GoodVibes input files, a Python wrapper script was written to automate all the steps starting from an initial guess geometry (.xyz file). This script (XTBDFT) communicates data between the various programs, tracks calculations, and automatically triggers the next calculation in the procedure. While code to interface CREST and GFN-xTB with DFT programs such as Orca and Turbomole has been previously published12,13, those programs are not as freely licensed and distributed as NWChem. NWChem is open-source and freely licensable for all users, which has facilitated its implementation in massively distributed cloud-computing solutions38,39,40. While molecular dynamics has been interfaced with NWChem11, the underlying molecular mechanics forcefields are not accurately parameterized for transition metal atoms or exotic functional groups. We believe the portable and lightweight nature of this wrapper script, along with the generous licensing terms of the underlying free chemistry engines, will allow for wide-spread adoption and modification among the chemistry and molecular machine-learning communities. The current version of XTBDFT is available online41.

Results

Composite procedure for identifying lowest-energy conformer

To computationally predict the thermodynamics of PNP to PPN isomerization, the minimum-energy conformation of each isomer must be obtained. Conventional geometry optimization algorithms employed by DFT software may identify a relatively high-lying local minimum, especially in systems containing multiple hindered degrees of rotation. Thus, we developed a composite procedure to identify and evaluate a low-energy conformation, consisting of: (a) the semi-empirical quantum chemical meta-dynamics package CREST to generate and crudely rank a diverse ensemble of conformers and (b) a quick, low level of DFT to re-optimize and re-rank the lowest energy conformers. The composite procedure is flexible in the choices of thermochemical recipe for determining the lowest-energy conformation.

As a prototypical case, we considered the conformer ensembles (ΔEXTB < 6 kcal/mol vs. minimum-energy conformer) of (Ph2P)2NiPr (2) and its PPN isomer 2′. Free energy corrections, Gxtb(RRHO), were calculated using GFN2-xTB vibrational calculations on the GFN2-xTB-optimized structures. A variety of electronic energies were obtained for each conformer, E0-3 (Table 1), of increasing computational cost. For analysis, we consider relative energies:

$$\Delta {\text{E}}_{n} = {\text{ E}}_{n} {-}{\text{ E}}_{n}^{0} ,$$
(1)

where En0 is the electronic energy of the minimum conformer in the initial CREST search. ΔE3 is strongly correlated with ΔG3 (Fig. 4) for conformer ensembles of both 2 (R2 = 0.996) and 2′ (R2 = 0.978); thus vibrational calculations were not employed in identifying the lowest-energy conformers for other compounds in this study.

Table 1 Thermochemical recipes.
Figure 4
figure 4

Comparison of relative free energies of conformers (ΔG3) versus relative electronic energies of conformers (ΔE3) for (a) 2 and (b) 2′.

Further computational savings could be achieved by using a lower level of theory for geometry optimization and electronic energy evaluation. For the conformer ensembles of 2 and 2′, we plotted the ΔE3 of each conformer versus its ΔEXTB and ΔE0-2 (Fig. 5). The energy calculations ΔE1 and ΔE2 were highly correlated with ΔE3 (R2 > 0.94), indicating that the computational expense of expanding the basis set from def2-SV(P) to def2-SVP for single-point energy evaluations is not required. In contrast, the more affordable energy calculation ΔE0 was accurate for the conformer ensemble of 2 (R2 = 0.98), but not for that of 2′ (R2 = 0.75). The most affordable energy calculation, ΔExtb, was inaccurate for ordering conformer ensembles of both 2 and 2′. GFN-xTB was developed to produce accurate geometries, frequencies, and non-covalent interactions with remarkable computational efficiency16, but for accurate relative energy calculations in this study, higher-level DFT calculations are necessary to determine global minimum energy conformations. Thus, ΔE1 was chosen as an economical yet accurate computational metric by which to identify the lowest energy conformer.

Figure 5
figure 5

Comparison of thermochemical recipes against ΔE3 for conformer ensembles of 2 and 2′ (all units are kcal/mol).

PNP-to-PPN isomerization energy (ΔGPPN) of known compounds

The above procedure identified the lowest energy conformer for a wide range of PNP compounds and their PPN isomers. The lowest energy conformer was further optimized with B3LYP-D3/def2-SVP, and electronically evaluated using B3LYP-D3/def2-TZVP (E4). Quasi-harmonic thermochemical corrections were applied to obtain free energies for the lowest energy PNP and PPN conformers, G4(PNP) and G4(PPN). The free energy of PNP-to-PPN isomerization was then calculated:

$$\Delta {\text{G}}_{{{\text{PPN}}}} = {\text{ G}}_{4} \left( {{\text{PPN}}} \right) \, {-}{\text{ G}}_{4} \left( {{\text{PNP}}} \right),$$
(2)

The ΔGPPN of 133 are tabulated in Table 2 and found to match experimental observations (see “Discussion” section).

Table 2 Calculated PNP-to-PPN isomerization energy (ΔGPPN , kcal/mol).

Screening novel compounds

Having benchmarked our composite computational method against reported experimental observations, we applied this method to predict the synthetic accessibility of novel PNP ligand candidates. Screening a wide pool of candidates, several were predicted to be thermodynamically stable against isomerization to PPN (ΔGPPN > 1 kcal/mol); synthesis and catalytic testing of these candidates is underway and will be reported in a future publication. Some examples of ligand candidates with ΔGPPN < 1 kcal/mol (Table 1) are shown in Fig. 6 (3444) and listed in Table 2.

Figure 6
figure 6

Novel ligand candidates with ΔGPPN < 1 kcal/mol. Black structures are calculated to have positive ΔGPPN; red structures have negative ΔGPPN.

Discussion

To assess the accuracy of our computational method, we compared ΔGPPN for 133 to previously reported experimental observations. PNP/PPN isomers can be kinetically trapped during synthesis from lithiated aminophosphine and chlorophosphine; however, in some cases the kinetic products can be observed converting to the thermodynamic product in the presence of excess chlorophosphine, which acts as an isomerization catalyst42,43. Alternatively, synthesis from primary amine, two equivalents of chlorophosphine, and triethylamine in dichloromethane solvent is proposed to yield the thermodynamic product because PNP-PPN isomerization is catalyzed by triethylammonium chloride42. We thus expected the calculated ΔGPPN values (Table 1) to correspond with previously reported experimental observations.

The PNPs derived from alkylamines (14) have been isolated via the triethylamine route22,42,44, and accordingly have positive calculated ΔGPPN. N-trimethylsilylamine-derived PNP 5 has only been synthesized with the lithiation route45,46, even in studies where other PNP compounds were made via the triethylamine route46; this observation is consistent with the calculated ΔGPPN of − 2.6 kcal/mol. With bulker N-triisopropylsilylamine, 6 has never been isolated. Using the same synthetic procedure as for 5, only the PPN isomer 6′ was observed, which is consistent with the much more negative ΔGPPN of − 10.4 kcal/mol. With bis(dicyclohexylphosphino)amines, the ΔGPPN values match previously isolated isomers 7 and 8′43. For aniline-based molecules, ΔGPPN agrees with experimentally observed isomers from triethylamine syntheses: 9, 10′, 11, 12′47, and 13′47. Depending on the synthetic conditions, different PNP/PPN product mixtures have been reported for 14 and 1547, and accordingly these compounds have relatively small but negative ΔGPPN. The N-pyridyl compounds 1618 were all isolated from triethylamine syntheses, and they all exhibit positive ΔGPPN. Their reported isomerization to PPN species upon protonation48 is matched by the negative ΔGPPN for 1921. Tris(diphenylphosphino)amine 22 has been synthesized in only 8% yield from lithiated bis(diphenylphosphino)amine and chlorophosphine49, although other reports have reported exclusive formation of the PPNP isomer 22′45. ΔGPPN is small (0.6 kcal/mol), reflecting the accessibility of both isomers.

23-(n-C17H35) and 24-(n-C17H35) have both been isolated from triethylamine-based syntheses25. While ΔGPPN of 23-CH3 has a small negative value, that of 24-CH3 is quite positive and large, leading to the novel observation: electron-withdrawing P-substituents favour the PNP isomer. We expect this to be a useful design strategy for novel PNP ligands.

Conversely, N-substitution follows the opposite pattern, as shown by NO2-substituted 25 and OMe-substituted 26 with ΔGPPN of − 0.9 and 4.7 kcal/mol, respectively. The negative ΔGPPN of 23-CH3 and 25 conflicts with the reportedly isolated PNP compounds, and perhaps indicate that the error of this computational thermodynamic method is ca. 1 kcal/mol. Compounds 2732 have all been isolated in the PNP form from triethylamine syntheses, and accordingly they all have ΔGPPN > 0. These similarly sized compounds clearly show that ΔGPPN is higher with N-alkyl substitution instead of N-aryl substitution.

As a strategy to disfavour PPN isomerization, the N-substituent can be covalently tethered to a P-substituent. Synthesis of 33 was reported to yield no detectable amounts of 33′50, and accordingly 33 has a ΔGPPN of 6.1 kcal/mol (c.f. 3.5 kcal/mol for 9).

Summarizing the results for our computational method, with |ΔGPPN|> 0.9 the predicted PNP/PPN isomer matches experimentally reported isomers synthesized with triethylamine. With |ΔGPPN|≤ 0.9 kcal/mol, the experimentally isolated isomers depend on exact experimental conditions. Compare this accuracy with conventional DFT geometry optimization to a single nearest minimum: for PNP molecules, conformers 9.9 kcal/mol higher than the minimum energy conformer have been identified previously25.

Having established the agreement of ΔGPPN with experimental observations, we sought to examine its relation to polyethylene by-product formation during Cr-catalyzed ethylene oligomerization24. These catalytic reactions are known to be highly sensitive to air, moisture, temperature, and various experimental parameters, and so we selected experimental data previously published by Sasol that were all collected under similar conditions44,46,51. There is a notable correlation between ΔGPPN and lower polyethylene formation (Table 3). As a simplified model, if PNP and PPN isomers are in thermodynamic equilibrium ([PPN]0/[PNP]0 = e−ΔGPPN/RT), initial PNP concentration is directly correlated with ethylene oligomerization productivity (oligomerization productivity = a[PNP]0), initial PPN concentration is directly correlated with polymerization productivity (polymerization productivity = b[PPN]0), and a and b are constant across the range of PNP and PPN compounds, then the following relation should be observed:

$$\ln \, \left( {{\text{polyethylene}}\,{\text{wt}}\% } \right) \, = \, \ln \, \left( {b{/}a} \right) \, {-} \, \Delta {\text{G}}_{{{\text{PPN}}}} {\text{/ RT}}$$
(3)

In agreement, there is an inverse linear relationship between the logarithm of polyethylene wt% and ΔGPPN (Fig. 7), supporting the proposal that polyethylene formation proceeds through PPN-derived catalytic species. Scatter results from a and b varying across the range of ligands (experimental catalytic activities, listed in Table 3, span two orders of magnitude between the various ligands). However, there is still remarkable correlation using this simplified model (R2 = 0.72), indicating that PNP stability against isomerization to PPN is a useful design criterion for novel ligands.

Table 3 Reported polyethylene production (wt.%) versus ΔGPPN (kcal/mol).
Figure 7
figure 7

Reported polyethylene by-product production versus ΔGPPN in graphical form.

As a qualitative summary of Fig. 7, the best-performing (meaning, in this case, the least polyethylene-producing) PNP ligands are those with ΔGPPN > 6 kcal/mol, and we are heeding that in design and development of novel PNP-based catalysts. While unstable PNP ligands have been pre-coordinated to Cr to resist PPN isomerization46, we have chosen to only pursue the ligands with ΔGPPN > 1 kcal/mol. Candidates 3444, ruled out by this criterion, are shown in Fig. 6, and their ΔGPPN values are tabulated in Table 1. Significant researcher time will be saved by ruling out these candidates with non-trivial synthetic routes and poor expected catalytic performance.

Machine learning from quantum chemical computational models is a powerful tool that has often been brought to bear on ethylene oligomerization catalysts52,53,54,55,56. The scriptable and automated nature of our composite computational procedure is poised to contribute to catalyst discovery driven by machine learning and artificial intelligence. Unlike conventionally used molecular mechanics-based conformational searching algorithms, the extended tight binding theory used in our procedure is parameterized out-of-the-box for transition metals; an area of ongoing research is the application of XTBDFT toward the analysis of conformationally complex organometallic intermediates and transition states, perhaps using more computationally expensive DFT or coupled cluster methods.

Conclusions

We have developed XTBDFT, an automated workflow to efficiently screen and evaluate conformationally complex molecules. We have applied this composite method to known PNP/PPN compounds to determine their relative thermodynamic stability and shown excellent agreement with the experimentally observed isomers. Furthermore, we show that thermodynamic stability of PNP ligands against isomerization to PPN is strongly correlated with lower undesired polyethylene formation. This procedure can be applied generally to other conformationally complex systems. This research leverages entirely open-source software, which we envision can be utilized by the greater computational chemistry and machine learning communities.