Accelerating reliable multiscale quantum refinement of protein–drug systems enabled by machine learning

Yan, Zeyin; Wei, Dacong; Li, Xin; Chung, Lung Wa

doi:10.1038/s41467-024-48453-4

Download PDF

Article
Open access
Published: 16 May 2024

Accelerating reliable multiscale quantum refinement of protein–drug systems enabled by machine learning

Nature Communications volume 15, Article number: 4181 (2024) Cite this article

2148 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Biomacromolecule structures are essential for drug development and biocatalysis. Quantum refinement (QR) methods, which employ reliable quantum mechanics (QM) methods in crystallographic refinement, showed promise in improving the structural quality or even correcting the structure of biomacromolecules. However, vast computational costs and complex quantum mechanics/molecular mechanics (QM/MM) setups limit QR applications. Here we incorporate robust machine learning potentials (MLPs) in multiscale ONIOM(QM:MM) schemes to describe the core parts (e.g., drugs/inhibitors), replacing the expensive QM method. Additionally, two levels of MLPs are combined for the first time to overcome MLP limitations. Our unique MLPs+ONIOM-based QR methods achieve QM-level accuracy with significantly higher efficiency. Furthermore, our refinements provide computational evidence for the existence of bonded and nonbonded forms of the Food and Drug Administration (FDA)-approved drug nirmatrelvir in one SARS-CoV-2 main protease structure. This study highlights that powerful MLPs accelerate QRs for reliable protein–drug complexes, promote broader QR applications and provide more atomistic insights into drug development.

SQM2.20: Semiempirical quantum-mechanical scoring function yields DFT-quality protein–ligand binding affinity predictions in minutes

Article Open access 06 February 2024

Quantum machine learning using atom-in-molecule-based fragments selected on the fly

Article 14 September 2020

MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery

Article Open access 10 May 2024

Introduction

Accurate atomic structures of biomacromolecules are vital for molecular property prediction, binding pose estimation, as well as understanding ligand binding site recognition and biocatalysis. Additionally, this structural information plays an indispensable role in the rational development and design of new drugs with high potency and selectivity that specifically target the binding site^1,2,3. In this regard, X-ray diffraction (XRD) has long been one of the most powerful methods to determine the atomic structures of many biomacromolecules. Structural determination often relies on standard X-ray crystallographic refinement methods, in which the molecular mechanics (MM) force field is combined with experimental (XRD) data to derive reasonable chemical structures^4,5. However, the development of force fields (using limited parameters) to give reliable structures of diversified drug molecules has long been challenging due to the enormous variety of chemical space (with many element–element combinations) and complex electronic effects (such as conjugation/delocalization)⁶. Recent breakthroughs in the development of various artificial intelligence (AI) methods (e.g., AlphaFold or RoseTTAFold) can predict impressively reasonable atomic structures of some proteins^7,8. Unfortunately, these AI-based methods (such as AlphaFold) still cannot easily predict reliable biological structures containing cofactors or drugs/inhibitors, due to scarce experimental structural data^9,10.

On the other hand, the development of quantum refinement (QR, pioneered by Ryde)¹¹, which replaces the MM method with more accurate quantum mechanics (QM) methods, can overcome the challenges of reliably describing various drug structures¹². In fact, the QR method has been successfully applied to some protein–drug/inhibitor systems, such as acetylcholinesterase with the anti-Alzheimer drug donepezil¹³ and serine proteases with benzamidinium-based inhibitors¹⁴. These successful applications of QR to biomacromolecules have demonstrated many promising results in improving the structural quality and even giving the correct structures^{15,16,17,18,19}. Moreover, recent developments in QR combined with multiscale^{11,13,15,16,17,18,19,20,21,22}, linear-scale QM^23,24, fragmentation^25,26,27,28 and quantum-embedding²² methods by different groups further boost or improve the refinement process. Nevertheless, compared to the fast MM methods usually used in crystallographic refinements, the much higher computational costs and complex setup of QM/MM systems hinder the broad applications of QR to many biological systems.

Recently, active AI method development (such as machine learning potentials (MLPs)) has emerged as a promising alternative that can quickly predict energy and related gradients after training QM energies and gradients with very large datasets of various atomic configurations. MLPs are remarkable for their robustness and for accuracy comparable to that of high-level ab initio methods used to generate training data^29,30,31,32. ANI series using deep neural networks (DNNs) are powerful and successful MLPs (achieving density functional theory (DFT) or coupled-cluster (CC) accuracy) that have attracted much attention in recent years^33,34,35 due to their flexibility and transferability to a wide range of molecules and systems. However, MLPs are still less transferable than QM methods and are generally limited to closed-shell, neutral organic compounds. The Dral group elegantly developed a general-purpose artificial intelligence–quantum mechanical method 1 (AIQM1) to approach coupled-cluster (CC) accuracy³⁶ by combining the Δ-machine learning (ML) strategy³⁷ and semi-empirical (SE) method³⁸, using ANI-type neural network (and datasets) potentials and D4-dispersion³⁹ as corrections. The AIQM1 method displayed good accuracy even for challenging systems (such as ions and excited states). Additionally, several groups adopted ML to replace or correct the expensive QM method in ab initio QM/MM methods^40,41,42 to enhance accuracy and reduce the high computational cost of QM/MM molecular dynamics simulations^{43,44,45,46,47,48}. Inspired by these encouraging results in new MLPs, we envision that MLPs will introduce a new opportunity to develop a faster and more accurate QR method^49,50, since high-level CC methods are rarely used in QR applications due to their prohibitive computational costs²².

In this study, MLPs are incorporated for the first time as the high layer (Fig. 1a) in the multiscale QR method²², in which the expensive QM method(s) is replaced by the much faster ANI or AIQM1 method. Due to the high dependence on training data, the MLPs are limited to few elements (e.g.,: AIQM1: C, H, O, N; ANI-2x: C, H, O, N, F, Cl, S) or specific systems (ANI: neutral systems). To apply refinement of drug/inhibitor molecules containing more elements and to overcome such limitation while maintaining the highest accuracy on the core drug/inhibitor structures, two different levels (CC- and DFT-quality) of MLPs (denoted as MLP-CC and MLP-DFT) were further combined for the first time through an extrapolative Our own N-layered Integrated molecular Orbital and molecular Mechanics (ONIOM) approach and introduced in our ONIOM QR schemes (i.e. ONIOM2(MLP:MM), ONIOM3(MLP:SE:MM), especially the unprecedented ONIOM3(MLP-CC:MLP-DFT:MM) and ONIOM4(MLP-CC:MLP-DFT:SE:MM) schemes, see Table 1). The geometries of 50 different protein–drug/inhibitor systems were then refined and evaluated by various above-mentioned ONIOM-based QR schemes (Fig. 1, Supplementary Figs. 1–4, and Table 1), along with X-ray experimental data. Our results demonstrate that these MLPs+ONIOM-based QR schemes can successfully reach QM-level accuracy with much higher efficiency, which should promote much broader applications of QR methods and be helpful for drug development/design. Moreover, our refinements offer computational evidence for coexisting the bonded and nonbonded forms of the FDA-approved drug nirmatrelvir in one crystal structure of SARS-CoV-2 main protease (M^Pro), which should be helpful for designing better SARS-CoV-2 drugs.

**Fig. 1: Schematic workflow and selected protein–drug/inhibitor systems.**

Table 1 Computational chemistry methods in our quantum refinements^a

Full size table

Results

Drug/Inhibitor structures in the gas phase

To assess the performance of a few MLPs, geometry optimization of our selected 50 drugs/inhibitors (QR50 dataset, Figs. 1 b, 2 and Supplementary Figs. 1–4; the optimized geometries are provided in Supplementary Data) in the gas phase was first performed using the ωB97X-D/6-31G(d), AIQM1, ANI-1ccx, ANI-2x, ANI-1x, and second-generation Geometry, Frequency, Noncovalent, eXtended Tight Binding (GFN2-xTB) methods. Compared to the reliable results optimized by the (QM) ωB97X-D method, our structural analysis shows that median absolute deviation (MAD) in the bond distances, angles and rotatable dihedrals of the drug/inhibitor molecules by the MLPs (AIQM1, ANI-2x) and SE (GFN2-xTB) methods vary by 0.005–0.008 Å, 0.6–0.9° and 11.2–16.1°, respectively (Table 2). In general, all MLPs led to similar drug/inhibitor structures to those optimized by the DFT method with all the median (white dots) and the highest deviation distribution (maximum width) locations close to zero (Fig. 2). However, the GFN2-xTB method underestimated bond distances, with a median bond deviation distribution close to −0.01 Å.

**Fig. 2: Comparison of the optimized structure in the gas phase.**

Table 2 Comparison of the optimized structures in the gas phase for different datasets

Full size table

To further evaluate the accuracy of the MLPs methods for more different functional groups⁵¹, a new computational benchmark dataset taken from PDBbind v2020 (denoted as PB20-QM, https://github.com/oscarchung-lab/PB20-QM) containing 12,963 drug/inhibitor molecules as well as its two smaller sub-datasets (PB20-QM-8 k: 8776 molecules containing, C, H, O, N, F, S, and/or Cl elements; and PB20-QM-3 k: 3156 molecules mainly containing, C, H, O, and/or N elements) were also set up for geometry optimization by the different methods (Table 2). Compared to the reliable ωB97X-D method, our computational results show that MAD in the bond distances, angles and rotatable dihedrals of the drug/inhibitor molecules by the MLPs (AIQM1, ANI-2x) and SE (GFN2-xTB) methods vary by 0.003–0.006 Å, 0.4–0.6°, and 26.7–32.0°, respectively (Table 2).

In addition, the drug/inhibitor molecules containing charged group(s) were generally found to have larger structural deviations (e.g., MAD) when optimized by all MLPs and SE methods than the neutral drug/inhibitor cases (Table 2), except the dihedrals in PB20-QM-3k possibly due to its scarce systems. For the neutral cases, these MLPs showed higher accuracy than GFN2-xTB with smaller structural deviation in bond (ΔMAD: 0.002–0.004 Å) and angle (ΔMAD: 0.1–0.2°) compared to the DFT method. The dihedral deviations for the AIQM1 and GFN2-xTB methods are generally comparable and smaller than those for MLP ANI-2x. Even the ANI-series MLPs were not primarily designed for charged systems, the ANI-2x method still showed a small structural deviation (MAD: <0.009 Å (bond), <0.2° (angle)) for those containing charged group(s). In comparison, the CC-level MLP AIQM1 method gives superior results to the ANI-series MLPs. Therefore, these MLPs, particularly the CC-quality AIQM1 method, can give reliable structures for drug/inhibitor systems at much lower computational costs. Moreover, apart from the charge effect, the structural flexibility of the drug/inhibitor molecules is another key important factor in the larger RMSDs in the gas phase (Supplementary Table 36).

Quantum refinement of protein–drug/inhibitor systems

These MLPs were further introduced into ONIOM-based QR schemes to refine the structures of the 50 protein–drug/inhibitor systems (Fig. 1 and Supplementary Figs. 1–4; the refined geometries are provided in Supplementary Data). Compared to the X-ray crystal structures, the real-space difference density Z (RSZD) scores of these 50 drugs/inhibitors (except nirmatrelvir, acylated ceftazidime, and isatin, including its covalent linkages) after various QRs were reduced by 1.0–1.1 on average (Fig. 3a and Supplementary Table 25), showing structural improvement by the QRs. The most significant improvement was found in nirmatrelvir for one SARS-CoV-2 M^pro system and isatin for DJ-1 system, whose RSZD scores can be significantly decreased from 7.6/6.0 (X-ray crystal structure) to 0.5–1.0/0.9–2.1, respectively, due to the consideration of two conformers (vide infra).

**Fig. 3: Comparison of the quantum refinement results.**

In agreement with the above-mentioned smaller RSZD scores after QRs, the electron density analysis also demonstrated noticeable improvement of electron density around all refined drug/inhibitor binding sites generally (Fig. 4 and their corresponding electron density maps given in Supplementary Information). For instance, the electron density maps for imatinib, osimertinib, and CPI-0610 systems showed better density fitting after our QRs. Moreover, compared to the X-ray electron density maps, the discrepancy with the experimental observations was significantly reduced (i.e., fewer green and/or red contours) by our QRs. In addition, for the case of the imatinib-tyrosine kinase system, the electron density maps around the imatinib binding site (Fig. 4b) are almost identical for refinements using the schemes based on DFT (ONIOM3(DFT:SE:MM), M7) and MLPs (ONIOM3(ANI-2x:SE:MM) and ONIOM3(AIQM1:SE:MM), M8 and M10, respectively). This improved electron density surrounding the pyridine, pyrimidine and N-phenylbenzamide moieties may result from their changed dihedrals (black: from 179.9° (X-ray crystal structure) to 162.6°~164.9°; red: from 4.2° (X-ray crystal structure) to 44.3°~48.6°). Consequently, the negative RSZD score decreased from -1.8 (X-ray crystal structure) to –0.1~0.0 after various QRs (Supplementary Fig. 19).

**Fig. 4: Our own N-layered Integrated molecular Orbital and molecular Mechanics (ONIOM) scheme with machine learning potentials (MLPs), and electron density maps.**

In addition, the computed strain energies (Fig. 3b) of these 50 drugs/inhibitors were considerably reduced by 27.1~31.4 kcal·mol⁻¹ after various QRs on average. The greatest reduction in strain energy was observed in the taurocholic acid-CmeR (from 239.8 to approximately 78.7–85.6 kcal·mol⁻¹) and darunavir-HIV-1 (from 117.3 to approximately 14.1–20.4 kcal·mol⁻¹) systems after our QRs. This strain can be attributed to the underestimated C–S bond (b5: 0.18–0.22 Å shorter) of taurocholic acid and two C–C bonds (b₃ and b₄: 0.25~0.28 Å shorter) of darunavir in the X-ray structures (Fig. 1b, Supplementary Fig. 78, and Supplementary Table 18). Therefore, these computational results illustrate that all QR schemes (M1–M10) clearly improved the local drug/inhibitor binding sites compared to those in the X-ray crystal structures.

Moreover, our structural analysis (Fig. 3c–e) further revealed that the bond distances, angles, and rotatable dihedrals of the drugs/inhibitors after refinement by various QR schemes remained consistent with the most reliable ONIOM3(DFT:SE:MM) method (M7). Compared to the X-ray structures, darunavir, oseltamivir, and osimertinib structures refined using methods M1–M10 displayed the largest absolute bond deviation (b₁: 0.19~0.21 Å, b₂: 0.18~0.20 Å, b₃: 0.28~0.29 Å, b₄: 0.25~0.26 Å; Supplementary Table 18) among the other drug/inhibitor systems, which can account for the very high strain energies computed in these X-ray structures (96.3–117.3 kcal·mol⁻¹, see Supplementary Table 26). Additionally, the largest absolute angle and dihedral deviations (using methods M1–M10) were found in our refined nirmatrelvir (a₁: 14.9°~17.5°) and imatinib (d₁: 37.7°~49.9°) structures (Supplementary Table 18), respectively.

Combination of two MLPs through ONIOM

Since CC-level MLPs (ANI-1ccx and AIQM1) can only be applied to limited elements (H, C, N, O), these higher-level MLPs cannot be employed to describe broader drug/inhibitor systems with more elements (F, S, Cl). To overcome this limitation and describe more drug/inhibitor systems, two different levels of MLPs were combined for the first time through an extrapolative ONIOM scheme, in which the major core structures are described by the higher-level MLP-CC method (ANI-1ccx or AIQM1) and the remaining parts containing other elements are described by the lower-level MLP-DFT (ANI-2x) method (Fig. 4a). This new combination of two MLPs enables the unprecedented ONIOM3(MLP-CC:MLP-DFT:MM) (M4a and M5a) and ONIOM4(MLP-CC:MLP-DFT:SE:MM) (M9a and M10a) schemes for our QRs on the 20 selected protein–drug/inhibitor systems containing F, Cl or S element (Supplementary Fig. 3).

Pleasingly, our refined CPI-0610 structure in bromodomain obtained by these ONIOM4-based schemes can give similar electron density around the binding site to the reliable ONIOM3(DFT:SE:MM) M7 scheme, with reduced discrepancy (i.e., fewer red and/or green contours) from the experimental observations (Fig. 4c and Supplementary Fig. 66). This improved electron density in the CPI-0610 system after these ONIOM4(ANI-1ccx:ANI-2x:SE:MM)- and ONIOM4(AIQM1:ANI-2x:SE:MM)-based (M9a and M10a, respectively) QRs might result from the changed dihedrals (black: from −45.4° (X-ray crystal structure) to −36.5°~−35.6°; red: from −177.8° to −168.5°~−165.8°) and changed p-ClC₆H₄ angle (blue: from 117.7° (X-ray crystal structure) to 114.4°~114.5°). Likewise, their RSZD scores for the CPI-0610 system were reduced from 1.3 (X-ray crystal structure) to 0.7–0.8 after these ONIOM4-based QRs. Moreover, the RMSD for the CPI-0610 structure refined by the ONIOM4(MLP-CC:MLP-DFT:SE:MM) scheme with reference to ONIOM3(DFT:SE:MM) (M7) was very small (bonds < 0.009 Å, angles < 0.8°, dihedrals < 2.6°). Similarly satisfactory refined results were also found in the other 19 systems (e.g., RSZD scores reduced by 0.4–7.1, Supplementary Table 25). Consequently, these findings show that these unique ONIOM4(MLP-CC:MLP-DFT:SE:MM) and ONIOM3(MLP-CC:MLP-DFT:MM) schemes can improve protein–drug/inhibitor structures (comparable to the DFT method) with higher computational efficiency, suggesting that the combination of several levels of MLPs via the ONIOM approach offers a promising avenue for overcoming some limitations of MLPs and enhancing their advantages.

Combined MLP-CC/SE methods for broader drug molecules

Moreover, the higher-level AIQM1 and GFN2-xTB methods were further combined through the ONIOM scheme for our QRs on the three selected protein–drug/inhibitor systems (Supplementary Fig. 4), in which the major core of the drug/inhibitor structures and their remaining parts containing other elements (P or Br) are described by the AIQM1 and SE methods, respectively. The RMSD and MAD for these refined drug/inhibitor structures with reference to ONIOM3(DFT:SE:MM) (M7) were very small (RMSD: bonds < 0.011 Å, angles < 0.7°, dihedrals< 2.6°; MAD: bonds < 0.011 Å, angles < 0.5°, dihedrals < 2.3°). Moreover, our refined drug/inhibitor structures can give similar electron density around the binding site to the reliable M7 scheme (Supplementary Figs. 186, 191, and 229). What’s more, their RSZD scores were reduced from 1.4–3.9 (X-ray crystal structure) to 1.3–2.5 after these QRs. Therefore, these results show that the combined MLP-CC with SE method can further resolve the limitations of MLPs and be applied to much more elements and molecules.

Correlation

Pleasingly, the use of these MLPs as the QM method successfully accelerated our QR processes with comparable accuracy to QM (e.g., RSZD score, strain energy and geometrical parameters). Additionally, MLP-CC based schemes, (M4-M5 and M9-M10) gave the lowest RSZD scores among our ONIOM2- and ONIOM3-based QRs for 39 of the 50 selected systems (Supplementary Table 25), even lower than the DFT-based ONIOM3(DFT:SE:MM) M7 scheme. Figure 5 displays the correlation (R²) of the refined bond distances, angles, rotatable dihedrals, RSZD scores and strain energy obtained by the M8, M10, and M6R schemes (ONIOM3(ANI-2x:SE:MM), ONIOM3(AIQM1:SE:MM) and ONIOM2(SE:MM), respectively) compared to those obtained by the M7 scheme. The refined bond distances and angles obtained by the M8, M10 and M6R were in very good agreement with those obtained by the M7 for the drug/inhibitor systems containing both charged and neutral groups (bond distances: R² > 0.988; angles: R² > 0.962). In contrast, their rotatable dihedrals, RSZD scores, and strain energies showed a clear disparity between the systems containing neutral and charged groups.

**Fig. 5: Accuracy of different QR schemes.**

For neutral systems, MLP-based (ONIOM3(ANI-2x:SE:MM), ONIOM3(AIQM1:SE:MM), M8 and M10, respectively) and SE-based (ONIOM2-(SE:MM), M6R) schemes gave RSZD scores (R² > 0.971) consistent with those of the DFT-based scheme (ONIOM3(DFT:SE:MM), M7). Likewise, the M8, M10 and M6R schemes also showed good agreement for the dihedrals (R² > 0.935). Moreover, the MLP-DFT-based (M8) and MLP-CC-based (M10) schemes gave similar strain energies to those of the DFT-based scheme (R²: 0.996 and 0.992, respectively), significantly better than the those of SE-based scheme (M6R, R²: 0.910). On the other hand, for the systems containing charged group(s), as ANI-type MLPs were not developed for charged systems, the M8 scheme gave the lowest R² values for the angles (0.962), dihedrals (0.924) and strain energy (0.869). However, all MLP-based schemes (M8-M10) still showed RSZD scores consistent with those of the DFT-based scheme (M7; R²: 0.995–0.996), as well as the SE-based scheme (M6R, R²: 0.996). Promisingly, the AIQM1-based scheme (M10) showed high R² on the bonds (0.995) and dihedrals (0.974). Therefore, compared to the reliable DFT-based scheme, these MLP-based schemes displayed good performance (structures, RSZD scores and strain energy) in our QRs of neutral systems, and the AIQM1-based scheme can offer high accuracy for the challenging charged cases.

Structural comparison in the gas phase and in proteins

Figure 6 shows RMSDs (with reference to the DFT method) of all computed drug/inhibitor structures in the gas phase and in proteins. The RMSDs of the optimized drug/inhibitor structures in the gas phase (0.40–0.74 Å for ANI-2x, AIQM1, and GFN2-xTB) were significantly higher than their corresponding structures in proteins refined by QRs (0.02–0.03 Å: M8, M10, and M6R relative to M7). These results can be attributed to the much higher structural flexibility in the gas phase, compared to the confined space of the protein-binding sites. Moreover, the RMSDs for the drugs/inhibitors containing charged group(s) were found to be slightly larger than those for the neutral drugs/inhibitors when ANI-2x were used, possibly due to the aforementioned limitation of ANI methods. Encouragingly, the higher-level AIQM1 method can give better performance of the optimized drug/inhibitor structures in the gas phase (charged: 0.49 Å; neutral: 0.59 Å) than the ANI-2x method (charged: 0.74 Å; neutral: 0.67 Å). Overall, these results demonstrate that QRs using MLPs (especially the more accurate CC-level AIQM1) can provide reliable drug/inhibitor structures in proteins (comparable to the DFT level), although larger structural errors were observed in the gas phase.

Efficiency

In terms of computational costs, QRs using these MLPs are significantly more efficient than the DFT method as the high level (Table 3). For instance, QRs of imatinib in spleen tyrosine kinase required approximately 144 CPU core-hours (Intel Xeon Gold 5218) for the DFT-based (ONIOM2(DFT:MM), M1) scheme, but only about 1.5–2.4 core-hours for the MLP-based (ONIOM2(ANI-2x:MM) and ONIOM2(AIQM1:MM), M3 and M5, respectively) and SE-based (ONIOM2(SE:MM), M6) schemes. Alternatively, QR using the robust M3 scheme was first performed, followed by additional refinement using M1 (M3→M1). Approximately 54.8 CPU core-hours were required in this dual refinement approach to reach the genius DFT accuracy. For the case of nirmatrelvir in the SARS-CoV-2 M^Pro, QRs using the MLP-based schemes (M3, M5 and M6) required about 6.7–31.5 core-hours, which was much faster than using the DFT-based M1 scheme (3948.4 core-hours). A dual refinement approach (M3→M1) also significantly reduced the QR time (626.8 core-hours). These benchmark results demonstrate that QRs using MLPs can give reliable results with much lower computational costs. Computational time can also be reduced using a dual refinement approach using MLPs followed by the DFT method as the high level.

Table 3 Efficiency of the quantum refinements

Full size table

Two conformers of nirmatrelvir in SARS-CoV-2 M^Pro

The electron density map around the nirmatrelvir binding site in the crystal structure of wild-type SARS-CoV-2 M^Pro shows pronounced red contours around the C–S covalent bond linkage to the Cys145 (Fig. 7a). Unfortunately, our QRs performed by several schemes (M1–M10) only marginally alleviated this substantial discrepancy in the electron density and slightly reduced the RSZD score from 7.6 (X-ray structure) to 5.8 (Supplementary Table 38). The electron density discrepancy in the C–S bond linkage after these QRs remained severe (Supplementary Fig. 42). On the basis of the reversible C–S bond formation mechanism⁵², the possible existence of both bonded and nonbonded conformers (Fig. 7a, b) was further considered in our QRs.

**Fig. 7: Quantum refinements of SARS-CoV-2 M^Pro.**

These two conformers were first individually refined using the DFT-based ONIOM3(DFT:SE:MM) M7 scheme. The refined results were then combined with different ratios of occupation of the two conformers. Our refined results suggest that an occupation ratio of approximately 7:3 (bonded: nonbonded) gives the greatest structural improvement with the lowest RSZD scores (0.5, Fig. 7b) as well as the marked improvement of the electron density (Fig. 7c). Similarly, ONIOM QRs using the other MLP-based schemes (ONIOM3(ANI-2x:SE:MM), ONIOM3(ANI-1ccx:SE:MM) and ONIOM3(AIQM1:SE:MM), M8-M10, respectively) and the same occupation ratio (7:3) also provided significant improvement in the electron density (Fig. 7c) and the RSZD scores (from 7.6 (X-ray crystal structure) to 0.5–0.7 only). Moreover, the computed strain energy was substantially decreased by 30.2–36.8 kcal·mol⁻¹. Again, the refined results using the MLP-based schemes (M8-M10) were very similar to those using the DFT-based scheme (M7) with derivation of RSZD scores < 0.2 only and almost identical electron density maps (Fig. 7c). Consequently, our QR results provide computational evidence of the coexistence of the bonded and nonbonded forms of nirmatrelvir in the crystal structure of SARS-CoV-2 M^Pro, which should afford important structural information for designing better SARS-CoV-2 drugs. Moreover, large RSZD score and electron density discrepancy of the key C–S bond linkage were observed in another crystal structure of wild-type SARS-CoV-2 M^Pro (PDB ID: 7SI9, Supplementary Table 42 and Supplementary Fig. 52), which may also imply the coexistence of the bonded and nonbonded forms. Furthermore, the coexistence of the bonded and nonbonded forms were recently proposed in another crystal structure of wild-type SARS-CoV-2 M^Pro (PDB ID: 7VH8)⁵².

Discussion

In this study, machine learning potentials (MLPs) were proposed and utilized to replace the reliable but expensive QM method and to accelerate multiscale QR processes for the first time. To overcome the element restrictions in some MLPs, two different levels of MLPs were also combined by an extrapolative ONIOM approach and then applied as the unprecedented ONIOM3(MLP1:MLP2:MM) and ONIOM4(MLP1:MLP2:SE:MM) schemes for QR. Our QR results demonstrated that MLPs (especially the high-level AIQM1 method reaching coupled-cluster accuracy) could achieve highly accurate drug/inhibitor structures in 50 different protein–drug/inhibitor systems, comparable to the results of the reliable DFT method, with computational costs significantly lower by roughly two orders of magnitude. Moreover, our QR results provided computational evidence of coexisting the bonded and nonbonded forms of the FDA-approved drug nirmatrelvir in one crystal structure of SARS-CoV-2 M^Pro, which should provide new structural insights for drug design for SARS-CoV-2 treatment. Our proof-of-concept study showed that powerful MLPs can accelerate the QR of protein–drug/inhibitor complexes with high accuracy, which should promote more QR applications and provide new atomistic insights into molecular recognition, catalysis, and drug development. Additionally, apart from X-ray crystallography, we believe that MLPs could be helpful for modern structural determination methods of biomacromolecules (e.g. Cryo-EM, MicroED)^53,54. Furthermore, computational benchmark datasets (PB20-QM, PB20-QM-8 k and PB20-QM-3 k) were set up to evaluate the structural reliability of 3–19 k drug/inhibitor molecules computed by the DFT, MLPs and/or SE methods, which hopefully help future development of better DFT, MLPs and/or SE methods for drug/inhibitor molecules. We are also optimistic that with the advances of more high-level and general MLPs developed by different groups^{55,56,57,58,59}, the fast and reliable QR of diverse biosystems will be routinely performed on any normal desktop computer in the future.

Methods

Multiscale QR methods using MLPs

On the basis of our previous ONIOM-based QR method²², the total energy function of the entire system is given by Eq. 1:

$${E}_{{total}}={E}_{{ONIOM}}+\, {\omega }_{\alpha }*{E}_{{xray}}$$

(1)

$${E}_{{ONIOM}2\left({high}:{low}\right)}={E}_{{high},{model}}+{E}_{{low},{real}}-{E}_{{low},{model}}$$

(2)

$${E}_{{ONIOM}3\left({high}:{medium}:{low}\right)}= {E}_{{high},{model}}+{E}_{{medium},{intermediate}}-{E}_{{medium},{model}} \\ +{E}_{{low},{real}}-{E}_{{low},{intermediate}}$$

(3)

where ${E}_{{ONIOM}}$ represents the energy contribution from ONIOM-based calculations⁶⁰, ${E}_{{xray}}$ stands for the energy contribution derived from the crystallographic penalty and ${\omega }_{\alpha }$ is the weighting factor that balances the contributions of each term. Equations 2 and 3 represent the two- and three-layer ONIOM schemes, respectively.

In general, flexible ONIOM methods can significantly improve the structure of active sites by ONIOM2(QM:MM), ONIOM2(SE:MM), and/or ONIOM3(QM:SE:MM) schemes. To reduce the computational cost of the QM method and to accelerate the ONIOM-based QRs, a few MLPs (ANI-2x, ANI-1ccx, and AIQM1)^33,35,36 are employed to replace the QM method, i.e., ONIOM2(MLP:MM) and ONIOM3(MLP:SE:MM) schemes. However, CC-level MLPs (MLP-CC, such as ANI-1ccx and AIQM1) were only trained on systems containing H, C, N, and O elements, while DFT-level MLPs (MLP-DFT, such as ANI-2x) was trained on systems containing H, C, N, O, F, S, and Cl elements. To extend the MLP-CC applications to more drug/inhibitor molecules containing F, S, Cl elements, two different levels of MLPs were further combined through an extrapolative ONIOM approach (ONIOM2(MLP-CC:MLP-DFT), Eq. 4). Such ONIOM2(MLP-CC:MLP-DFT) part was then used to replace the E_high,model part in ONIOM2(QM:MM) in Eq. 2 or that in ONIOM3(QM:SE:MM) in Eq. 3 to derive unique ONIOM3(MLP-CC:MLP-DFT:MM) and ONIOM4(MLP-CC:MLP-DFT:SE:MM) schemes, respectively. To further apply MLP to even boarder drug/inhibitor molecules containing P or Br element, AIQM1 (or ANI-2x) and SE methods were also combined to describe the drug/inhibitor molecules by the ONIOM approach (Eq. 5), which replaces the E_high,model part in ONIOM2(QM:MM) in Eq. 2 or that in ONIOM3(QM:SE:MM) in Eq. 3.

$${E}_{{ONIOM}2({MLP}-{CC}:{MLP}-{DFT})}={E}_{{MLP}-{CC},{model}}+{E}_{{MLP}-{DFT},{real}}-{E}_{{MLP}-{DFT},{model}}$$

(4)

$${E}_{{ONIOM}2({MLP}:{xTB})}={E}_{{MLP},{model}}+{E}_{{xTB},{real}}-{E}_{{xTB},{model}}$$

(5)

Combination of two different MLP levels can readily be applied and extended to an additive QM/MM framework (Eqs. 6–7)^60,61,62.

$${E}_{{MLP}-{CC}/{MLP}-{DFT}/{MM}}= {E}_{{MLP}-{CC},{model}}+{E}_{{MLP}-{DFT},{intermediate}}\\ -{E}_{{MLP}-{DFT},{model}}+{E}_{{MM}}+{E}_{{MLP}-{DFT}-{MM}}$$

(6)

$${E}_{{MLP}-{CC}/{MLP}-{DFT}/{xTB}/{MM}}= {E}_{{MLP}-{CC},{model}}+{E}_{{MLP}-{DFT},{intermediate}1}\\ -{E}_{{MLP}-{DFT},{model}}+{E}_{{xTB},{intermediate}2}\\ -{E}_{{xTB},{intermediate}1}+{E}_{{MM}}+{E}_{{xTB}-{MM}}$$

(7)

The experimental part ${E}_{{xray}}$ (Eq. 1) was obtained from the Crystallography and NMR System (CNS) program⁶³.

Protein preparations

All experimental data and protein–drug/inhibitor crystal structures were obtained from the Protein Data Bank (PDB)^{64,65,66,67,68,69,70,71,72}. The CNS topology and parameter files of these drug/inhibitor molecules can be generated from PRODRG and ATB servers. Prior to ONIOM QR calculations, the proteins were first prepared by determining the protonated state and the rotamer of some amino-acid side chains (see details in the Supporting Information), as well as addition of hydrogen atoms. The rotamer of the amino-acid side chain was determined by WHATCHECK and our visual examination. The addition of hydrogen atoms and optimization of the hydrogen-bond network were then performed using PDB2PQR 3.5.2 program. The protonated states of the titratable residues were assigned on the basis of estimated pKa results computed from PROPKA 3.0 program at the pH of crystallization, while the protonated states of the drug/inhibitor molecules were generally assigned by cross-validation of MolGpka, Graph-pKa, and pkasolver. All added hydrogens were firstly optimized using ONIOM2(DFT:MM) with fixing the heavy atoms.

ONIOM QR calculations

Various ONIOM-based schemes were used to assess performance of our QRs (M1–M10, Table 1), after the protein preparations. All QRs were conducted using our ONIOM_QR code²². Various ONIOM-based ONIOM2(QM:MM), ONIOM2(MLP:MM), ONIOM2(SE:MM), ONIOM3(QM:SE:MM) and ONIOM3(MLP:SE:MM) schemes were mainly performed using Gaussian 16. A popular ωB97X-D functionals as the DFT method combined with 6–31 G(d) basis sets was employed as the QM method⁷³. ANI-1ccx³³, ANI-1x³⁴, ANI-2x³⁵, and AIQM1³⁶ methods were used as MLPs. GFN2-xTB method was employed as the SE method⁷⁴, and Amber ff14SB force fields served as the MM method⁷⁵. The ANI and AIQM1 methods were performed using TorchANI⁷⁶ and MLatom⁷⁷, respectively, which were called through Gaussian external interface to achieve two-layer ONIOM2(MLP-CC:MLP-DFT) (such as ONIOM2(ANI-1ccx:ANI-2x) and ONIOM2(AIQM1:ANI-2x)), three-layer ONIOM3(MLP-CC:MLP-DFT:MM) and four-layer ONIOM4(MLP-CC:MLP-DFT:SE:MM) schemes. Additionally, optimization of these drug/inhibitor molecules in the gas phase using ANI-2x, ANI-1x, ANI-1ccx, AIQM1, GFN2-xTB, and ωB97X-D/6-31 G(d) as well as, for some molecules containing P, Br elements, ONIOM2(AIQM1:GFN2-xTB) methods were carried out to analyze the MLPs performance using Gaussian 16 as the geometry optimizer. Moreover, effects of dispersion correction, DFT functional, and diffuse functions in the QM method on our QRs of the selected 10 drug/inhibitor (shown in Fig. 1) were examined. These selected 10 drug/inhibitor-protein structures refined based on M1 quantum refinements using ωB97X/6-31 G(d), ωB97X-D/6-31 + G(d) or M06-2X/6-31 G(d) method as the QM method are very similar as those by ωB97X-D/6-31 G(d) method (MAD: 0.001 Å, 0.0–0.1° and 0.2–0.3°, respectively; Supplementary Table 3).

For our two- and three-layer ONIOM QR calculations/setup (see the Supplementary Information for details), the drug/inhibitor molecules (except Nirmatrelvir in wild-type SARS-CoV-2 M^Pro, ceftazidime in beta-lactamase, and isatin in DJ-1) were set as the high-layer model part and the optimized region, as well as the medium layer in the three-layer ONIOM methods included neighboring residues within a radius of 3.0 Å to the model part. In the case of the SARS-CoV-2 M^pro system, the Nirmatrelvir drug and its linkage CYS145 residue were defined as the high-layer model part and optimized region. Similarly, in the case of the beta-lactamase and DJ-1 systems, the drug/inhibitor and its covalent linkage residue were defined as the high-layer model part and optimized region. For the systems containing H, C, N, and O elements, M4, M5, M9 and M10 QR schemes were used. Whereas, for the 20 drug/inhibitor systems containing H, C, N, O, F, Cl, and S elements, M4a, M5a, M9a, and M10a QR schemes were used. ONIOM(MLP:SE) method was used for the three molecules containing P and Br elements when AIQM1 or ANI-2x was used. Furthermore, the highest-methods combination used in this study was considered to be the most reliable computational method: ONIOM3(DFT:SE:MM) (M7). All refined results by different methods were generally used to compare with those by M7.

The convergence conditions (atomic unit) for our QR calculations were kept consistent with the default settings in Gaussian 16: ΔE < 10⁻⁵, Step_max < 1.8 × 10⁻², Step_RMS < 1.2 × 10⁻², ∇E_max < 4.5 × 10⁻³, ∇E_RMS < 3 × 10⁻³. The weighting factor ${\omega }_{\alpha }$ was derived from CNS for each system (see details in the Supporting Information). Analysis on the refined results (including electron density, and real-space difference density Z (RSZD)) score were carried out using Refmac5, and Edstats modules implemented in CCP4i2 8.0. Electron density maps were drawn using Pymol. The strain energy was determined by calculating the QM energy difference between the fully-optimized ligand in the gas phase and the refined ligand extracted from the protein at the ωB97X-D/6-31 G(d) level.

Gas-phase geometry optimization

All drug/inhibitor molecules were also optimized by the above-mentioned DFT, MLPs, and SE methods using Gaussian 16 as the geometry optimizer. In addition, effects of dispersion correction, DFT functional and diffuse functions were also examined using the selected 10 drug/inhibitor (shown in Fig. 1). These molecules optimized by ωB97X/6-31 G(d), ωB97X-D/6-31 + G(d) or M06-2X/6-31 G(d) method are very similar as those by ωB97X-D/6-31 G(d) method (MAD: 0.001 Å, 0.1–0.2°, and 7–16°, respectively; Supplementary Fig. 6, and Supplementary Table 2). Therefore, ωB97X-D/6-31 G(d) was used as the DFT method in this study.

Datasets setup

Our drug/inhibitor molecules downloaded from PDBbind v2020 dataset were used to set up a benchmark dataset (denoted as PB20-QM), which contains 12,963 drug/inhibitor molecules after removing metal-containing systems. The protonation states of all these molecules remain preserved as those in the PDBbind v2020 dataset, except for minor corrections (to fix very poor or wrong positions of H atoms based on their corresponding PDB structures) in some molecules. Classification of charged group(s) in these molecules was based on protonation state(s) given in the PDBbind v2020 dataset. Geometry optimization for all molecules in this PB20-QM dataset in gas phase were conducted by the above-mentioned DFT and SE methods. In addition, two smaller sub-datasets (PB20-QM-8k: 8776 molecules containing, C, H, O, N, F, S, and/or Cl elements; and PB20-QM-3k: 3156 molecules mainly containing, C, H, O, and/or N elements) were also set up. Geometry optimization for all molecules in this PB20-QM-8k dataset were conducted by the above-mentioned DFT, SE, and ANI-2x methods, whereas those in PB20-QM-3k dataset were performed by the above-mentioned DFT, SE, AIQM1, and ANI-2x methods.

Data availability

The crystal structures and experimental data used in this work are available in the RSCB Protein Data Bank (https://www.rcsb.org/) under the PDB IDs listed in the paper. All the supporting data are provided in the main text and Supplementary Information. Our PB20-QM benchmark datasets have been deposited in our GitHub repository (https://github.com/oscarchung-lab/PB20-QM). The optimized/refined geometries of the QR50 dataset in this study are provided in Supplementary Data file. Source data are also provided with this paper. Source data are provided with this paper.

Code availability

All original code is publicly available at https://github.com/oscarchung-lab/ONIOM_QR/releases 2.0.0 as well as Zenodo (https://doi.org/10.5281/zenodo.10828284)⁷⁸.

References

Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
Article CAS PubMed Google Scholar
Noble, M. E. M., Endicott, J. A. & Johnson, L. N. Protein kinase inhibitors: Insights into drug design from structure. Science 303, 1800–1805 (2004).
Article ADS CAS PubMed Google Scholar
Jin, Z. M. et al. Structure of M-pro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293 (2020).
Article ADS CAS PubMed Google Scholar
Kleywegt, G. J. & Jones, T. A. Databases in protein Crystallography. Acta Crystallogr. Sect. D Struct. Biol. 54, 1119–1131 (1998).
Article ADS CAS Google Scholar
Jack, A. & Levitt, M. Refinement of Large Structures by Simultaneous Minimization of Energy and R-Factor. Acta Crystallogr. Sect. A Found. Crystallogr. 34, 931–935 (1978).
ADS Google Scholar
Lin, F.-Y., MacKerell, A. D. Force fields for small molecules. In Biomolecular Simulations: Methods and Protocols (eds Bonomi M., Camilloni C.) (Springer, New York 2019).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aithani, L. et al. Advancing structural biology through breakthroughs in AI. Curr. Opin. Struct. Biol. 80, 102601 (2023).
Article CAS PubMed Google Scholar
Ryde, U., Olsen, L. & Nilsson, K. Quantum chemical geometry optimizations in proteins using crystallographic raw data. J. Comput. Chem. 23, 1058–1070 (2002).
Article CAS PubMed Google Scholar
Bergmann, J., Oksanen, E. & Ryde, U. Combining crystallography with quantum mechanics. Curr. Opin. Struct. Biol. 72, 18–26 (2022).
Article CAS PubMed Google Scholar
Fu, Z., Li, X., Miao, Y. & Merz, K. M. Jr. Conformational analysis and parallel QM/MM X-ray refinement of protein bound anti-Alzheimer drug Donepezil. J. Chem. Theory Comput. 9, 1686–1693 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, X., He, X., Wang, B. & Merz, K. Jr. Conformational variability of benzamidinium-based inhibitors. J. Am. Chem. Soc. 131, 7742–7754 (2009).
Article CAS PubMed PubMed Central Google Scholar
Cao, L. L., Caldararu, O., Rosenzweig, A. C. & Ryde, U. Quantum refinement does not support dinuclear copper sites in crystal structures of particulate methane monooxygenase. Angew. Chem. Int. Ed. 57, 162–166 (2018).
Article CAS Google Scholar
Hsiao, Y.-W., Sanchez-Garcia, E., Doerr, M. & Thiel, W. Quantum refinement of protein structures: implementation and application to the red fluorescent protein DsRed.M1. J. Phys. Chem. B 114, 15413–15423 (2010).
Article CAS PubMed Google Scholar
Cao, L. L. & Ryde, U. Quantum refinement with multiple conformations: application to the P-cluster in nitrogenase. Acta Crystallogr. Sect. D Struct. Biol. 76, 1145–1156 (2020).
Article ADS CAS Google Scholar
Bergmann, J., Oksanen, E. & Ryde, U. Can the results of quantum refinement be improved with a continuum-solvation model? Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater. 77, 906–918 (2021).
Article ADS CAS Google Scholar
Caldararu, O., Ekberg, V., Logan, D. T., Oksanen, E. & Ryde, U. Exploring ligand dynamics in protein crystal structures with ensemble refinement. Acta Crystallogr. Sect. D Struct. Biol. 77, 1099–1115 (2021).
Article ADS CAS Google Scholar
Fadel, F. et al. New insights into the enzymatic mechanism of human chitotriosidase (CHIT1) catalytic domain by atomic resolution X-ray diffraction and hybrid QM/MM. Acta Crystallogr. Sect. D Struct. Biol. 71, 1455–1470 (2015).
Article ADS CAS Google Scholar
Benediktsson, B. & Bjornsson, R. Quantum mechanics/molecular mechanics study of resting-state vanadium nitrogenase: molecular and electronic structure of the iron-vanadium cofactor. Inorg. Chem. 59, 11514–11527 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yan, Z., Li, X. & Chung, L. W. Multiscale quantum refinement approaches for metalloproteins. J. Chem. Theory Comput. 17, 3783–3796 (2021).
Article CAS PubMed Google Scholar
Canfield, P., Dahlbom, M. G., Hush, N. S. & Reimers, J. R. Density-functional geometry optimization of the 150 000-atom photosystem-I trimer. J. Chem. Phys. 124, 024301 (2006).
Article ADS PubMed Google Scholar
Goerigk, L., Collyer, C. A. & Reimers, J. R. Recommending Hartree–fock theory with London-Dispersion and Basis-Set-Superposition corrections for the optimization or quantum refinement of protein structures. J. Phys. Chem. B 118, 14612–14626 (2014).
Article CAS PubMed Google Scholar
Borbulevych, O. Y., Plumley, J. A., Martin, R. I., Merz, K. M. & Westerhoff, L. M. Accurate macromolecular crystallographic refinement: incorporation of the linear scaling, semiempirical quantum-mechanics program DivCon into the PHENIX refinement package. Acta Crystallogr. Sect. D Struct. Biol. 70, 1233–1247 (2014).
Article ADS CAS Google Scholar
Zheng, M. et al. Solving the scalability issue in quantum-based refinement: Q R#1. Acta Crystallogr. Sect. D Struct. Biol. 73, 1020–1028 (2017).
Article ADS CAS Google Scholar
Zheng, M. et al. Including crystallographic symmetry in quantum-based refinement: Q|R#2. Acta Crystallogr. Sect. D Struct. Biol. 76, 41–50 (2020).
Article ADS CAS Google Scholar
Wang, L. et al. Real-space quantum-based refinement for cryo-EM: Q|R#3. Acta Crystallogr. Sect. D Struct. Biol. 76, 1184–1191 (2020).
Article ADS CAS Google Scholar
Behler, J. Perspective: machine learning potentials for atomistic simulations. J. Chem. Phys. 145, 219901 (2016).
Article ADS PubMed Google Scholar
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Article ADS CAS PubMed Google Scholar
Noé, F., Tkatchenko, A., Müller, K.-R. & Clementi, C. Machine learning for molecular simulation. Annu. Rev. Phys. Chem. 71, 361–390 (2020).
Article ADS PubMed Google Scholar
Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 56, 12828–12840 (2017).
Article ADS CAS Google Scholar
Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).
Article ADS PubMed PubMed Central Google Scholar
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Article ADS PubMed Google Scholar
Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).
Article CAS PubMed Google Scholar
Zheng, P., Zubatyuk, R., Wu, W., Isayev, O. & Dral, P. O. Artificial intelligence-enhanced quantum chemical method with broad applicability. Nat. Commun. 12, 7022 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δ-Machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
Article CAS PubMed Google Scholar
Dral, P. O., Wu, X. & Thiel, W. Semiempirical quantum-chemical methods with orthogonalization and dispersion corrections. J. Chem. Theory Comput. 15, 1743–1760 (2019).
Article CAS PubMed PubMed Central Google Scholar
Caldeweyher, E., Bannwarth, C. & Grimme, S. Extension of the D3 dispersion coefficient model. J. Chem. Phys. 147, 034112 (2017).
Article ADS PubMed Google Scholar
Hu, H. & Yang, W. T. Free energies of chemical reactions in solution and in enzymes with ab initio quantum mechanics/molecular mechanics methods. Annu. Rev. Phys. Chem. 59, 573–601 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Senn, H. M. & Thiel, W. QM/MM Methods for biomolecular systems. Angew. Chem. Int. Ed. 48, 1198–1229 (2009).
Article CAS Google Scholar
Warshel, A. & Levitt, M. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 103, 227–249 (1976).
Article CAS PubMed Google Scholar
Shen, L. & Yang, W. Molecular dynamics simulations with quantum mechanics/molecular mechanics and adaptive neural networks. J. Chem. Theory Comput. 14, 1442–1455 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wu, J., Shen, L. & Yang, W. Internal force corrections with machine learning for quantum mechanics/molecular mechanics simulations. J. Chem. Phys. 147, 161732 (2017).
Article ADS PubMed PubMed Central Google Scholar
Shen, L., Wu, J. & Yang, W. Multiscale quantum mechanics/molecular mechanics simulations with neural networks. J. Chem. Theory Comput. 12, 4934–4946 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. J., Khorshidi, A., Kastlunger, G. & Peterson, A. A. The potential for machine learning in hybrid QM/MM calculations. J. Chem. Phys. 148, 241740 (2018).
Article ADS PubMed Google Scholar
Zeng, J., Giese, T. J., Ekesan, Ş. & York, D. M. Development of range-corrected deep learning potentials for fast, accurate quantum mechanical/molecular mechanical simulations of chemical reactions in solution. J. Chem. Theory Comput. 17, 6993–7009 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pan, X. et al. Machine-learning-assisted free energy simulation of solution-phase and enzyme reactions. J. Chem. Theory Comput. 17, 5745–5758 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pinheiro, M., Ge, F., Ferré, N., Dral, P. O. & Barbatti, M. Choosing the right molecular machine learning potential. Chem. Sci. 12, 14396–14413 (2021).
Article CAS PubMed PubMed Central Google Scholar
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ertl, P., Altmann, E. & McKenna, J. M. The most common functional groups in bioactive molecules and how their popularity has evolved over time. J. Med. Chem. 63, 8408–8418 (2020).
Article CAS PubMed Google Scholar
Zhao, Y. et al. Crystal structure of SARS-CoV-2 main protease in complex with protease inhibitor PF-07321332. Protein Cell 13, 689–693 (2022).
Article CAS PubMed Google Scholar
Bai, X. C., McMullan, G. & Scheres, S. H. W. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40, 49–57 (2015).
Article CAS PubMed Google Scholar
Nannenga, B. L. & Gonen, T. The cryo-EM method microcrystal electron diffraction (MicroED). Nat. Methods 16, 369–379 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. & Yang, W. Force field for water based on neural network. J. Phys. Chem. Lett. 9, 3232–3240 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, Z. et al. Transferable multilevel attention neural network for accurate prediction of quantum chemistry properties via multitask learning. J. Chem. Inf. Model. 61, 1066–1082 (2021).
Article CAS PubMed Google Scholar
Liao, K., Dong, S., Cheng, Z., Li, W. & Li, S. Combined fragment-based machine learning force field with classical force field and its application in the NMR calculations of macromolecules in solutions. Phys. Chem. Chem. Phys. 24, 18559–18567 (2022).
Article CAS PubMed Google Scholar
Cheng, Z. et al. Building quantum mechanics quality force fields of proteins with the generalized energy-based fragmentation approach and machine learning. Phys. Chem. Chem. Phys. 24, 1326–1337 (2022).
Article CAS PubMed Google Scholar
Zeng, J., Tao, Y., Giese, T. J. & York, D. M. QDπ: a quantum deep potential interaction model for drug discovery. J. Chem. Theory Comput. 19, 1261–1275 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chung, L. W. et al. The ONIOM method and its applications. Chem. Rev. 115, 5678–5796 (2015).
Article ADS CAS PubMed Google Scholar
Izsák, R. et al. Quantum computing in pharma: a multilayer embedding approach for near future applications. J. Comput. Chem. 44, 406–421 (2023).
Article PubMed Google Scholar
Ojha, A. A., Votapka, L. W. & Amaro, R. E. QMrebind: incorporating quantum mechanical force field reparameterization at the ligand binding site for improved drug-target kinetics through milestoning simulations. Chem. Sci. 14, 13159–13175 (2023).
Article CAS PubMed PubMed Central Google Scholar
Brunger, A. T. et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. Sect. D Biol. Crystallogr. 54, 905–921 (1998).
Article ADS CAS Google Scholar
Atwell, S. et al. A novel mode of Gleevec binding is revealed by the structure of spleen tyrosine kinase. J. Biol. Chem. 279, 55827–55832 (2004).
Article CAS PubMed Google Scholar
Bender, A. T. et al. Ability of Bruton’s tyrosine kinase inhibitors to sequester Y551 and prevent phosphorylation determines potency for inhibition of Fc receptor but not B-cell receptor signaling. Mol. Pharmacol. 91, 208–219 (2017).
Article CAS PubMed Google Scholar
Li, Q. et al. Functional and structural analysis of influenza virus neuraminidase N3 offers further insight into the mechanisms of oseltamivir resistance. J. Virol. 87, 10016–10024 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yan, X. E. et al. Structural basis of AZD9291 selectivity for EGFR T790M. J. Med. Chem. 63, 8502–8511 (2020).
Article CAS PubMed Google Scholar
Owen, D. R. et al. An oral SARS-CoV-2 M-pro inhibitor clinical candidate for the treatment of COVID-19. Science 374, 1586–1593 (2021).
Article ADS CAS PubMed Google Scholar
Roehrig, S. et al. Discovery of the novel antithrombotic agent 5-chloro-N-({(5S)-2-oxo-3 4-(3-oxomorpholin-4-yl)phenyl -1,3-oxazolidin- 5-yl}methyl)thiophene-2-carboxamide (BAY 59-7939): an oral, direct factor Xa inhibitor. J. Med. Chem. 48, 5900–5908 (2005).
Article CAS PubMed Google Scholar
Liu, Z. L. et al. Effects of hinge-region natural polymorphisms on human immunodeficiency virus-type 1 protease structure, dynamics, and drug pressure evolution. J. Biol. Chem. 291, 22741–22756 (2016).
Article CAS PubMed PubMed Central Google Scholar
Albrecht, B. K. et al. Identification of a benzoisoxazoloazepine inhibitor (CPI-0610) of the bromodomain and Extra-Terminal (BET) family as a candidate for human clinical trials. J. Med. Chem. 59, 1330–1339 (2016).
Article CAS PubMed Google Scholar
He, Y. Z. et al. Structures and mechanism for the design of highly potent glucocorticoids. Cell Res. 24, 713–726 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Chai, J. D. & Head-Gordon, M. Systematic optimization of long-range corrected hybrid density functionals. J. Chem. Phys. 128, 084106 (2008).
Article ADS PubMed Google Scholar
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB-An accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
Article CAS PubMed Google Scholar
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gao, X., Ramezanghorbani, F., Isayev, O., Smith, J. S. & Roitberg, A. E. TorchANI: a free and open source PyTorch-Based deep learning implementation of the ANI neural network potentials. J. Chem. Inf. Model. 60, 3408–3415 (2020).
Article CAS PubMed Google Scholar
Dral, P. O. et al. MLatom 2: an integrative platform for atomistic machine learning. Top. Curr. Chem. 379, 27 (2021).
Article CAS Google Scholar
YAN, Z., Chung L. W. Accelerating reliable multiscale quantum refinement of protein–drug systems enabled by machine learning. oscarchung-lab/ONIOM_QR: v2.0.0, https://doi.org/10.5281/zenodo.10828284 (2023).

Download references

Acknowledgements

We gratefully acknowledge the financial support from the National Natural Science Foundation of China (21873043, 21933003, 22193020, and 22193023), Southern University of Science and Technology, the Shenzhen Nobel Prize Scientists Laboratory Project (C17783101), Guangdong Provincial Key Laboratory of Catalysis (2020B121201002), and Natural Science Foundation of Shenzhen Innovation Committee (JCYJ20220530115408019). We thank the Center for Computational Science and Engineering of Southern University of Science and Technology CHEM HPC at SUSTech for partly supporting this work. Mr. Hui Xia is acknowledged for his discussion during the early stage of this study.

Author information

Authors and Affiliations

Shenzhen Grubbs Institute, Department of Chemistry and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology, Shenzhen, 518055, China
Zeyin Yan, Dacong Wei, Xin Li & Lung Wa Chung

Authors

Zeyin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Dacong Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Lung Wa Chung
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.W.C. conceived and designed the project. Z.Y. carried out QR calculations. D.W. performed the optimizations in the gas phase. L.W.C. and X.L. supervised the project. Z.Y. and L.W.C. prepared the manuscript. All authors analyzed and discussed the results as well as assisted manuscript preparation.

Corresponding author

Correspondence to Lung Wa Chung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Wei Li, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Dataset 1

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yan, Z., Wei, D., Li, X. et al. Accelerating reliable multiscale quantum refinement of protein–drug systems enabled by machine learning. Nat Commun 15, 4181 (2024). https://doi.org/10.1038/s41467-024-48453-4

Download citation

Received: 31 August 2023
Accepted: 24 April 2024
Published: 16 May 2024
DOI: https://doi.org/10.1038/s41467-024-48453-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Drug/Inhibitor structures in the gas phase

Quantum refinement of protein–drug/inhibitor systems

Combination of two MLPs through ONIOM

Combined MLP-CC/SE methods for broader drug molecules

Correlation

Structural comparison in the gas phase and in proteins

Efficiency

Two conformers of nirmatrelvir in SARS-CoV-2 MPro

Discussion

Methods

Multiscale QR methods using MLPs

Protein preparations

ONIOM QR calculations

Gas-phase geometry optimization

Datasets setup

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links

Two conformers of nirmatrelvir in SARS-CoV-2 M^Pro