Computer-assisted catalyst development via automated modelling of conformationally complex molecules: application to diphosphinoamine ligands

Simulation of conformationally complicated molecules requires multiple levels of theory to obtain accurate thermodynamics, requiring significant researcher time to implement. We automate this workflow using all open-source code (XTBDFT) and apply it toward a practical challenge: diphosphinoamine (PNP) ligands used for ethylene tetramerization catalysis may isomerize (with deleterious effects) to iminobisphosphines (PPNs), and a computational method to evaluate PNP ligand candidates would save significant experimental effort. We use XTBDFT to calculate the thermodynamic stability of a wide range of conformationally complex PNP ligands against isomeriation to PPN (ΔGPPN), and establish a strong correlation between ΔGPPN and catalyst performance. Finally, we apply our method to screen novel PNP candidates, saving significant time by ruling out candidates with non-trivial synthetic routes and poor expected catalytic performance.

). Both PNP and PPN molecules possess many conformations, and analysis of the wrong conformer will result in incorrect ΔG PPN . A previous DFT study of PNP ligand conformers found variation of up to 9.9 kcal/mol in calculated Gibbs free energy 25 , which would drastically impact any thermodynamic comparison with PPN isomer.
In this report, we utilize XTBDFT to automate global optimization and calculation of ΔG PPN of several known PNP/PPN compounds (1-33, Fig. 3), and observe strong agreement with experimental observations. Then a strong inverse relation between PNP stability and polyethylene formation during ethylene oligomerization catalysis is observed. Finally, this method is applied to screen novel PNP ligand candidates, saving significant time by ruling out candidates with non-trivial synthetic routes and poor expected catalytic performance.

Methods
Initial guess geometries for PNP and PPN molecules were conveniently generated with MolView 27 . These molecules have numerous hindered degrees of rotation, and standard geometry optimization algorithms may yield a relatively high-energy conformer. A previous DFT study of relatively simple (i.e. high substitutional symmetry) PNP molecules found that geometric local minima could differ by 9.9 kcal/mol in Gibbs free energy 25 . Humandirected generation of conformer ensembles for computational analysis is time-intensive and inconsistent. To ensure identification of a low-energy conformer, an efficient computational method must be utilized to generate and crudely rank a broad conformer ensemble. Conformer sampling algorithms employed in Tinker 28 , OpenBabel 29 , and various other molecular modelling programs rely on heavily parameterized classical force fields or semiempirical methods that may not accurately reproduce geometries or energies of exotic molecules or transition-metal complexes, and in some cases researchers have to resort to substituting unparameterized elements with more common elements 30 . We instead use the meta-dynamics package Conformer Rotamer Ensemble Sampling Tool (CREST 2,17 ) driven by the semi-empirical density functional tight binding theory GFN2-xTB, which, unlike previously developed semi-empirical methods, has been broadly parameterized for all elements H-Rd and benchmarked against diverse chemical databases 16 , to automate generation of an ensemble of conformers within 2.0 kcal/mol of the minimum energy conformer. As an added benefit of using CREST for conformation searching, its built-in conformer symmetry analysis identifies rotamers that are chemically identical, greatly reducing the size of the conformer ensemble to be processed by higher levels of theory. In one example, CREST identified 35 unique conformers of (Ph 2 P) 2 NPh (9) within 6 kcal/mol of the lowest energy conformer. For comparison, the conformer searching procedure built into Spartan '18 (powered by Merck Molecular Force Field 31 ) returns 225 conformers.
Further geometry optimizations of the conformer ensemble were performed with density functional theory (DFT) as implemented in NWChem 18,32 (version 6.8) with the B3LYP functional 33,34 , def2-SV(P) basis set 35 , Weigend Coulomb-fitting auxiliary basis set 36 , Grimme DFT-D3 dispersion corrections 37 , NWChem medium integration grid, and loose geometric convergence criteria (using NWChem input parameters of gmax 0.002; grms 0.0003; xrms 1; xmax 1). The lowest energy conformer at this intermediate level of theory was then further geometry optimized with tighter convergence criteria (gmax 0.0001; grms 0.00003; xrms 0.0006; xmax 0.001), and thermochemical corrections were calculated, with the def2-SVP basis set 35 and the NWChem fine integration grid. Finally, a high-level single-point electronic energy evaluation was performed using the def2-TZVP basis set 35 .
Because of the hindered degrees of rotation, several optimized geometries possess low-magnitude frequency vibrations, which are inaccurately treated by the harmonic oscillator approximation for thermodynamics. A correction for these vibrations was applied by quasi-harmonically raising all frequencies below 100 cm −1 using the GoodVibes script 6 . Goodvibes code was modified to allow parsing of NWChem output files, and this customization has been merged into the publicly available Goodvibes release.
Taking advantage of the programmable nature of CREST, NWChem, and GoodVibes input files, a Python wrapper script was written to automate all the steps starting from an initial guess geometry (.xyz file). This script (XTBDFT) communicates data between the various programs, tracks calculations, and automatically triggers the next calculation in the procedure. While code to interface CREST and GFN-xTB with DFT programs www.nature.com/scientificreports/ such as Orca and Turbomole has been previously published 12,13 , those programs are not as freely licensed and distributed as NWChem. NWChem is open-source and freely licensable for all users, which has facilitated its implementation in massively distributed cloud-computing solutions [38][39][40] . While molecular dynamics has been interfaced with NWChem 11 , the underlying molecular mechanics forcefields are not accurately parameterized for transition metal atoms or exotic functional groups. We believe the portable and lightweight nature of this wrapper script, along with the generous licensing terms of the underlying free chemistry engines, will allow for wide-spread adoption and modification among the chemistry and molecular machine-learning communities. The current version of XTBDFT is available online 41 .

Results
Composite procedure for identifying lowest-energy conformer. To computationally predict the thermodynamics of PNP to PPN isomerization, the minimum-energy conformation of each isomer must be obtained. Conventional geometry optimization algorithms employed by DFT software may identify a relatively high-lying local minimum, especially in systems containing multiple hindered degrees of rotation. Thus, we developed a composite procedure to identify and evaluate a low-energy conformation, consisting of: (a) the semi-empirical quantum chemical meta-dynamics package CREST to generate and crudely rank a diverse ensemble of conformers and (b) a quick, low level of DFT to re-optimize and re-rank the lowest energy conformers. The composite procedure is flexible in the choices of thermochemical recipe for determining the lowestenergy conformation. As a prototypical case, we considered the conformer ensembles (ΔE XTB < 6 kcal/mol vs. minimum-energy conformer) of (Ph 2 P) 2 N i Pr (2) and its PPN isomer 2′. Free energy corrections, G xtb (RRHO), were calculated using GFN2-xTB vibrational calculations on the GFN2-xTB-optimized structures. A variety of electronic energies were obtained for each conformer, E 0-3 (Table 1), of increasing computational cost. For analysis, we consider relative energies: where E n 0 is the electronic energy of the minimum conformer in the initial CREST search. ΔE 3 is strongly correlated with ΔG 3 (Fig. 4) for conformer ensembles of both 2 (R 2 = 0.996) and 2′ (R 2 = 0.978); thus vibrational calculations were not employed in identifying the lowest-energy conformers for other compounds in this study.
Further computational savings could be achieved by using a lower level of theory for geometry optimization and electronic energy evaluation. For the conformer ensembles of 2 and 2′, we plotted the ΔE 3 of each conformer versus its ΔE XTB and ΔE 0-2 (Fig. 5). The energy calculations ΔE 1 and ΔE 2 were highly correlated with ΔE 3 (R 2 > 0.94), indicating that the computational expense of expanding the basis set from def2-SV(P) to def2-SVP for single-point energy evaluations is not required. In contrast, the more affordable energy calculation ΔE 0 was accurate for the conformer ensemble of 2 (R 2 = 0.98), but not for that of 2′ (R 2 = 0.75). The most affordable energy calculation, ΔE xtb , was inaccurate for ordering conformer ensembles of both 2 and 2′. GFN-xTB was developed to produce accurate geometries, frequencies, and non-covalent interactions with remarkable computational efficiency 16 , but for accurate relative energy calculations in this study, higher-level DFT calculations are necessary to determine global minimum energy conformations. Thus, ΔE 1 was chosen as an economical yet accurate computational metric by which to identify the lowest energy conformer.
PNP-to-PPN isomerization energy (ΔG PPN ) of known compounds. The above procedure identified the lowest energy conformer for a wide range of PNP compounds and their PPN isomers. The lowest energy conformer was further optimized with B3LYP-D3/def2-SVP, and electronically evaluated using B3LYP-D3/def2-TZVP (E 4 ). Quasi-harmonic thermochemical corrections were applied to obtain free energies for the lowest energy PNP and PPN conformers, G 4 (PNP) and G 4 (PPN). The free energy of PNP-to-PPN isomerization was then calculated: The ΔG PPN of 1-33 are tabulated in Table 2 and found to match experimental observations (see "Discussion" section).

Discussion
To assess the accuracy of our computational method, we compared ΔG PPN for 1-33 to previously reported experimental observations. PNP/PPN isomers can be kinetically trapped during synthesis from lithiated aminophosphine and chlorophosphine; however, in some cases the kinetic products can be observed converting to the thermodynamic product in the presence of excess chlorophosphine, which acts as an isomerization catalyst 42,43 . Alternatively, synthesis from primary amine, two equivalents of chlorophosphine, and triethylamine in dichloromethane solvent is proposed to yield the thermodynamic product because PNP-PPN isomerization is catalyzed by triethylammonium chloride 42 . We thus expected the calculated ΔG PPN values (Table 1) to correspond with previously reported experimental observations. The PNPs derived from alkylamines (1-4) have been isolated via the triethylamine route 22,42,44 , and accordingly have positive calculated ΔG PPN . N-trimethylsilylamine-derived PNP 5 has only been synthesized with the lithiation route 45,46 , even in studies where other PNP compounds were made via the triethylamine route 46 ; this observation is consistent with the calculated ΔG PPN of − 2.6 kcal/mol. With bulker N-triisopropylsilylamine, 6 has never been isolated. Using the same synthetic procedure as for 5, only the PPN isomer 6′ was observed, which is consistent with the much more negative ΔG PPN of − 10.4 kcal/mol. With bis(dicyclohexylphosphino)amines, the ΔG PPN values match previously isolated isomers 7 and 8′ 43 . For aniline-based molecules, ΔG PPN agrees with experimentally observed isomers from triethylamine syntheses: 9, 10′, 11, 12′ 47 , and 13′ 47 . Depending on the synthetic conditions, different PNP/PPN product mixtures have been reported for 14 and 15 47 , and accordingly these compounds have relatively small but negative ΔG PPN . The N-pyridyl compounds 16-18 were all isolated from triethylamine syntheses, and they all exhibit positive ΔG PPN . Their reported isomerization to PPN species upon protonation 48 is matched by the negative ΔG PPN for 19-21. Tris(diphenylphosphino)amine 22 has been synthesized in only 8% yield from lithiated bis(diphenylphosphino)amine and chlorophosphine 49 , although other reports have reported exclusive formation of the PPNP isomer 22′ 45 . ΔG PPN is small (0.6 kcal/mol), reflecting the accessibility of both isomers. 25 . While ΔG PPN of 23-CH 3 has a small negative value, that of 24-CH 3 is quite positive and large, leading to the novel observation: electron-withdrawing P-substituents favour the PNP isomer. We expect this to be a useful design strategy for novel PNP ligands.

23-(n-C 17 H 35 ) and 24-(n-C 17 H 35 ) have both been isolated from triethylamine-based syntheses
Conversely, N-substitution follows the opposite pattern, as shown by NO 2 -substituted 25 and OMe-substituted 26 with ΔG PPN of − 0.9 and 4.7 kcal/mol, respectively. The negative ΔG PPN of 23-CH 3 and 25 conflicts with the reportedly isolated PNP compounds, and perhaps indicate that the error of this computational thermodynamic method is ca. 1 kcal/mol. Compounds 27-32 have all been isolated in the PNP form from triethylamine syntheses, and accordingly they all have ΔG PPN > 0. These similarly sized compounds clearly show that ΔG PPN is higher with N-alkyl substitution instead of N-aryl substitution.
As a strategy to disfavour PPN isomerization, the N-substituent can be covalently tethered to a P-substituent. Synthesis of 33 was reported to yield no detectable amounts of 33′ 50 , and accordingly 33 has a ΔG PPN of 6.1 kcal/ mol (c.f. 3.5 kcal/mol for 9).
Summarizing the results for our computational method, with |ΔG PPN |> 0.9 the predicted PNP/PPN isomer matches experimentally reported isomers synthesized with triethylamine. With |ΔG PPN |≤ 0.9 kcal/mol, the experimentally isolated isomers depend on exact experimental conditions. Compare this accuracy with conventional DFT geometry optimization to a single nearest minimum: for PNP molecules, conformers 9.9 kcal/mol higher than the minimum energy conformer have been identified previously 25 .
Having established the agreement of ΔG PPN with experimental observations, we sought to examine its relation to polyethylene by-product formation during Cr-catalyzed ethylene oligomerization 24 . These catalytic reactions are known to be highly sensitive to air, moisture, temperature, and various experimental parameters, and so we selected experimental data previously published by Sasol that were all collected under similar conditions 44,46,51 . There is a notable correlation between ΔG PPN and lower polyethylene formation ( Table 3). As a simplified model, if PNP and PPN isomers are in thermodynamic equilibrium ([PPN] 0 /[PNP] 0 = e −ΔGPPN/RT ), initial PNP concentration is directly correlated with ethylene oligomerization productivity (oligomerization productivity = a[PNP] 0 ), initial PPN concentration is directly correlated with polymerization productivity (polymerization productivity = b[PPN] 0 ), and a and b are constant across the range of PNP and PPN compounds, then the following relation should be observed: In agreement, there is an inverse linear relationship between the logarithm of polyethylene wt% and ΔG PPN (Fig. 7), supporting the proposal that polyethylene formation proceeds through PPN-derived catalytic species. Scatter results from a and b varying across the range of ligands (experimental catalytic activities, listed in Table 3, span two orders of magnitude between the various ligands). However, there is still remarkable correlation using this simplified model (R 2 = 0.72), indicating that PNP stability against isomerization to PPN is a useful design criterion for novel ligands. www.nature.com/scientificreports/ As a qualitative summary of Fig. 7, the best-performing (meaning, in this case, the least polyethylene-producing) PNP ligands are those with ΔG PPN > 6 kcal/mol, and we are heeding that in design and development of novel PNP-based catalysts. While unstable PNP ligands have been pre-coordinated to Cr to resist PPN isomerization 46 , we have chosen to only pursue the ligands with ΔG PPN > 1 kcal/mol. Candidates 34-44, ruled out by this criterion, are shown in Fig. 6, and their ΔG PPN values are tabulated in Table 1. Significant researcher time will be saved by ruling out these candidates with non-trivial synthetic routes and poor expected catalytic performance.
Machine learning from quantum chemical computational models is a powerful tool that has often been brought to bear on ethylene oligomerization catalysts [52][53][54][55][56] . The scriptable and automated nature of our composite computational procedure is poised to contribute to catalyst discovery driven by machine learning and artificial intelligence. Unlike conventionally used molecular mechanics-based conformational searching algorithms, the extended tight binding theory used in our procedure is parameterized out-of-the-box for transition metals; an area of ongoing research is the application of XTBDFT toward the analysis of conformationally complex organometallic intermediates and transition states, perhaps using more computationally expensive DFT or coupled cluster methods.

Conclusions
We have developed XTBDFT, an automated workflow to efficiently screen and evaluate conformationally complex molecules. We have applied this composite method to known PNP/PPN compounds to determine their relative thermodynamic stability and shown excellent agreement with the experimentally observed isomers. Furthermore, we show that thermodynamic stability of PNP ligands against isomerization to PPN is strongly correlated with lower undesired polyethylene formation. This procedure can be applied generally to other conformationally  www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.