Structure determination of an amorphous drug through large-scale NMR predictions

Knowledge of the structure of amorphous solids can direct, for example, the optimization of pharmaceutical formulations, but atomic-level structure determination in amorphous molecular solids has so far not been possible. Solid-state nuclear magnetic resonance (NMR) is among the most popular methods to characterize amorphous materials, and molecular dynamics (MD) simulations can help describe the structure of disordered materials. However, directly relating MD to NMR experiments in molecular solids has been out of reach until now because of the large size of these simulations. Here, using a machine learning model of chemical shifts, we determine the atomic-level structure of the hydrated amorphous drug AZD5718 by combining dynamic nuclear polarization-enhanced solid-state NMR experiments with predicted chemical shifts for MD simulations of large systems. From these amorphous structures we then identify H-bonding motifs and relate them to local intermolecular complex formation energies.


Sample preparation for DNP NMR
In DNP MAS experiments, the high thermal polarization is transferred from unpaired electrons to nuclei (typically 1 H) which results in enhanced NMR signals. For organic powders, this is achieved by impregnating the powdered solid with an otherwise inert polarizing solution. 1,2 1 dissolves both in water and in most organic solvents, so most typical polarizing solutions, such as 16 mM TEKPOL in 1,1,2,2tetrachloroethane (TCE), were found to be incompatible. Ortho-terphenyl was found to be a suitable non-solvent for AZD5718, and 16 mM TEKPOL in ortho-terphenyl 99.5%-d14 (OTP-d14) was used as a polarization source. The sample was prepared according to the procedure described in references 3 and 4 by mixing a solid solution of 16 mM TEKPOL in OTP-d14 with powdered 1, then transferring it to a sapphire rotor sealed with a PTFE insert and capped with a zirconia drive cap. The rotor was then heated at ca. 65°C in a hot water bath in order to melt the OTP and allow the liquid to impregnate the API. It was then quickly inserted into the pre-cooled LT-MAS DNP probe to rapidly freeze the sample in order for the OTP to form a glass. 4 DNP enhancements of about 5 as measured on crystalline AZD5718 signals through ( 1 H) 13 C DNP CPMAS were obtained, which was sufficient to allow the natural abundance INADEQUATE spectra to be recorded.

NMR spectroscopy
Experiments were performed on Bruker Ascend 400 and Ascend 500 wide-bore Avance III, and on Bruker 800 Ultrashield plus narrow-bore, and 900 US 2 wide-bore Avance Neo NMR spectrometers. The spectrometers operate at 1 H Larmor frequencies of 400.13, 500.43, 800.13, and 900. 13 MHz respectively, and are equipped with H/X/Y 3.2 mm, H/C/N/D 1.3 mm and H/C/N 0.7 mm CPMAS probes. When the 3.2 mm probe was used, the samples were restricted to the central third of a rotor with an inner diameter of 2.2 mm, in order to maximize rf homogeneity.
DNP solid-state NMR spectroscopy experiments were performed on a 400 MHz Avance III HD Bruker spectrometer. The spectrometer is equipped with a low temperature magic angle spinning (LTMAS) 3.2 mm probe and connected through a corrugated waveguide to a 263 GHz gyrotron capable of outputting ca. 5-10 W of continuous wave microwaves. 5 The sweep coil of the main magnetic field was optimized so that the microwave irradiation gave the maximum positive proton DNP enhancement with binitroxide cross effect-based polarizing agents (e.g. AMUPOL, TEKPOL). DNP enhancements were determined based on the ratio of the area of the spectra acquired with and without microwave irradiation.
1D 1 H MAS NMR spectra were recorded at a temperature of 298 K using rotor spinning rates (νr) up to 111 kHz. 1D 13 C cross-polarization 6 (CP) MAS NMR spectra were acquired at 298 K with νr of 22 kHz. The CP contact time was 2 ms and during the signal acquisition SPINAL-64 decoupling 7 was applied with a 1 H rf field amplitude of 100 kHz. 1D 15 N CP NMR spectra were acquired at 100 K under DNP MAS conditions with νr = 12.5 kHz for crystalline AZD5718, and similar measurements were made on amorphous AZD5718 using LT-MAS conditions (without DNP) in the same instrument. Variable amplitude cross-polarization 8 was used to transfer polarization from 1 H (60% to 100% ramp) to 15 N (constant amplitude). For the 15 N CPMAS spectra of crystalline AZD5718, 360 scans were acquired with DNP spaced by a recycling delay of 20 s leading to a total acquisition time of 2 h. For amorphous AZD5718, 14,720 scans were acquired without DNP, spaced by a recycling delay of 5 s leading to a total acquisition time of 21 h. 2D 1 H-13 C HETCOR experiments were carried out at 298 K using νr = 22 kHz. 96 points were acquired in the indirect dimension with the States acquisition method, 9 and with indirect sampling intervals (Δt1) of 96 µs. For the crystalline sample the recycle delay was 32 s (T1 ~ 22 s) and 64 scans were collected for each t1 point. For the amorphous sample the recycle delay was 4 s (T1 ~ 3 s) and 769 scans were collected for each t1 point. During t1 100 kHz eDUMBO-122 was applied to decouple the 1 H-1 H dipolar coupling, 10 and during t2 100 kHz SPINAL-64 decoupling was applied.
The 2D 13 C-13 C refocused INADEQUATE 11,12 spectrum of crystalline 1 was acquired using DNP MAS NMR. 13 For the 13 C-13 C refocused INADEQUATE experiment, the probe was configured into 1 H/ 13 C double resonance mode. Variable amplitude cross-polarization 8 was used to transfer polarization from 1 H to 13 C. SPINAL-64 7 heteronuclear 1 H decoupling with RF fields of 100 kHz was applied in all cases.
The DNP enhancement allowed to record a 13 C-13 C refocused 13 C-13 C INADEQUATE spectrum at natural abundance for 1 in about 2 days of signal averaging. Moreover, using a 1 H spin-lock of 30 ms between the 1 H excitation pulse and the CP, the otherwise dominant OTP solvent signal was efficiently removed, 14 allowing to record a 2D spectrum 13 C-13 C refocused DNP INADEQUATE clean from the solvent signal. The spectrum was acquired in about 45 h with 128 points recorded in the indirect dimension with 256 scans each separated by recycling time of 5 s. The increment in the indirect dimension was 40 µs, allowing a total indirect acquisition time of 5.12 ms using the States-TPPI method. 15 The tau period for J evolution was optimized and set to 4 ms. SPINAL-64 was used for heteronuclear decoupling.
All chemical shifts were referenced via alanine. The full set of acquisition parameters is given in Supplementary Tables 1

CSP protocol
To generate a predicted polymorph landscape for 1, the molecular conformation determined via singlecrystal XRD was optimized at the B3LYP-D3/6-31G(d,p) [16][17][18][19] level of theory in an implicit water environment using the Gaussian 09 Rev. D.01 program. 20 The media surrounding the molecule was described using the Self Consistent Reaction Field (SCRF) PCM method 21 with a dielectric constant ε set to 78.35530, as implemented in the Gaussian software. Atomic charges were obtained using the charges from electrostatic potentials using a grid-based method (CHELPG). 22 This is a slightly modified procedure compared to the previously published in-house CSP method using an internally developed force-field (AZ-FF). 23 The optimized geometry was then used in a single-point energy computation using the MacroModel program, 24 where a unique force-field for 1 was constructed. Conformational analysis was then performed within the GRACE program 25,26 in order to determine what parameters were allowed to be flexible in the molecule. For 1, all single bonds were allowed to be rotated, and the two saturated rings were allowed to adopt different ring conformations. Candidate crystal structures were generated in the seven most stable chiral space groups (P21, P212121, P1, C2, P21212, P43, C2221) employing the GRACE machinery for a flexible conformation under Z'=1 condition. The crystal structure space was searched using a Monte-Carlo (MC) parallel tempering method 27 followed by lattice energy minimization for each polymorph using the AZ-FF force field. 23 The search was continued until the convergence criterion for statistically finding all polymorphs in the search, set to 0.7, was met. 23 Typically, 3,000 structures are kept at this stage. A structure duplicate check allowed to reduce this number to 1,000 unique structures. From these, the top 190 candidates, named #1 through #190 by increasing force field energy, were selected for full DFT-D optimization using the PBE functional 28 and Neumann-Perrin dispersion correction 25 in the VASP software. [29][30][31][32] The default PAW pseudopotentials and a 520 eV plane-wave energy cutoff were used. The ten most stable polymorphs (within 6 kJ/mol) were then selected for NMR computation. An extended set of the following 81 most stable structures (within 23.3 kJ/mol) was also selected for NMR computation, but did not lead to a better match of the experimental chemical shifts than structure #1. These 81 structures were thus not included in the set of structures used for the Bayesian analysis displayed in Fig. 2c in the main text.

Chemical shift computation of candidate crystal structures
The proton positions of the candidates selected for NMR computations were optimized using the planewave DFT software Quantum ESPRESSO version 6.5. [33][34][35] The constrained optimizations were performed at the PBE level of theory 28 using Grimme D2 dispersion correction 36 and projector augmented wave scalar relativistic pseudopotentials with GIPAW reconstruction, H.pbe-tm-newgipaw-dc.UPF and C.pbe-tm-new-gipaw-dc.UPF, 37 and N.pbe-n-kjpaw_psl.1.0.0.UPF and O.pbe-n-kjpaw_psl.1.0.0.UPF. 38 The wavefunction and charge density energy cutoffs were set to 60 and 240 Ry, respectively, and the relaxations were carried out without k-point.
Chemical shifts were computed for the candidate crystal structures obtained through the CSP procedure at the PBE0 level of theory 39 using the cluster-and fragment-based approach introduced by Hartman et al. [40][41][42] (computational details are provided in Supplementary Table 5). Direct linear regression between the chemical shieldings computed for each candidate and the experimental chemical shifts were performed in order to obtain computed chemical shifts. The computations were run using the hybrid-many-body-interaction (HMBI) code 43,44 with Gaussian 16 Revision A.03 as the DFT engine. 45 The computed chemical shieldings !"#! were converted to isotropic chemical shifts !"#! through the relationship (1) For each candidate crystal structure, the value of $%& and were determined by linear regression between computed and experimental shifts, permuting the ambiguously assigned shifts to obtain the lowest root-mean-square error (RMSE).

Positional uncertainty of the crystal structure
Perturbed crystal structures were obtained by performing molecular dynamics simulations of the crystal structure at 1, 5, 10, 15, 20 and 25 K. 300 ps simulations were carried out with a time step of 0.5 fs and using the canonical (NVT) ensemble, and 21 snapshots were extracted from the last 150 ps of each simulation. The force-field and parameters used are the same as the ones used to model the amorphous structure (see Section 1.8), except for the electrostatic and Van der Waals interaction cutoffs, which were set to 2.8 Å to avoid self-interaction. No constraint on the bond lengths to hydrogen was set. The correlation between chemical shift RMSD ⟨ ⟩ and the average positional RMSD of atom i along the l th principal axis of its ensemble of positional deviations , ',) . was obtained by maximizing the loglikelihood between the computed correlation points and the Gaussian distribution described by Equation (2) as a function of the Gaussian parameters ',) and Σ ',) .

Generation of amorphous structures
To model the amorphous structure of AZD5718, we carried out MD simulations on periodic amorphous cells with a variable number of water molecules. The atomic positions of a single molecule of 1 extracted from the crystal structure determined via single-crystal XRD were first optimized at the B3LYP-D3/6-31G(d,p) [16][17][18][19] level of theory in gas phase using the Gaussian 09 revision D.01 program. 20 Optimized coordinates and CHELPG charges were extracted from the optimization and used as input to generate amorphous cells. Materials Studio 46 together with the COMPASS-II 47 force field were used to create cubic amorphous cells of 128 molecules of 1. Five cells of each water content; 0, 0.5, 1.0 and 2.0% (w/w, 0, 16, 32 and 65 water molecules in each cell, respectively), and two cells of 4% water (w/w, 132 water molecules in each cell) were generated. Geometries were optimized during the construction. The mean initial cell volumes were 73,004, 73,372, 73,740, 74,500, and 76,042 Å 3 for the 0, 0.5, 1, 2 and 4% water simulations, respectively.
The optimized coordinates and CHELPG charges of 1 were used as input to generate OPLS_2005 48,49 force field parameters using the Schrödinger ffld_server 50 . The "ffconv.py" tool was used to convert the topology into GROMACS format. 51 Water was treated using the TIP3P model in the MD simulations. 52

Molecular dynamics simulation of amorphous structures
The GROMACS program (version 2016.4) 53,54 was used for all MD simulations throughout the study. The systems were initially equilibrated for 1 ns using the canonical (NVT) ensemble at 298 K. The temperature was held constant using a modified Berendsen thermostat with velocity-rescaling with a coupling constant of 0.1 ps. 55 A second equilibration was carried out for 10 ns using the isothermalisobaric ensemble (NPT) at 298 K and 1 bar where the temperature and pressure were held constant using the velocity-rescaling thermostat with a coupling constant of 0.1 ps and a Berendsen barostat with a coupling constant of 1 ps. 55,56 Production simulations were carried out for 600 ns using the NPT ensemble at 298 K and 1 bar where the temperature and pressure were held constant using the velocityrescaling thermostat 55 with a coupling constant of 0.1 ps and the Parrinello-Rahman barostat with a coupling constant of 4 ps. 57,58 A particle mesh Ewald scheme 59,60 was used to compute the electrostatic interactions with a 10 Å cutoff for the real space. The same cutoff was used for van der Waals interactions, with long-range dispersion correction applied to both energy and pressure. Bond lengths to hydrogens were constrained using the LINCS algorithm. 61 System trajectories were collected every 10 ps. All simulations were performed using a time step of 2 fs. Models of the amorphous structure were obtained by extracting 1001 evenly spaced snapshots from the last 100 ns of each MD simulation, corresponding to 100 ps time steps between the extracted snapshots.

Chemical shift predictions and hydrogen bonding motifs in amorphous structures
The predicted shieldings +$%, obtained using ShiftML were converted to chemical shifts +$%, through the relationship: (1) where $%& and b were determined by minimizing the spectral contrast angle 62 between the simulated spectra, obtained by summing Lorentzian functions with a 0.3 ppm linewidth centred on the predicted shifts, and the experimental spectra. For the crystalline compound, the regression parameters were found to be = −0.91 and $%& = 27.9 ppm. For the amorphous form, the regression was only performed on the 4% water simulations, and applied to all other water contents. The obtained parameters are = −0.99 and $%& = 30.9 ppm.

Raw data
The NMR raw data are available from https://doi.org/10.24435/materialscloud:gg-mx in JCAMP-DX version 6.0 standard format and original TopSpin format. Data are made available under the license CC-BY-4.0 (Creative Commons Attribution-ShareAlike 4.0 International).
The Python scripts used to analyse NMR crystallography and MD simulation data are available from the same link and made available under the license CC-BY-4.0 (Creative Commons Attribution-ShareAlike 4.0 International).

Chemical shift assignment
The 1 H, 13 C and 15 N resonances of AZD5718 (Fig. 1e) were assigned using one-dimensional proton, carbon and nitrogen MAS NMR experiments (Fig. 1a-c), as well as two-dimensional refocused 13 C-13 C INADEQUATE and 1 H-13 C HETCOR experiments (Fig. 1d-e). The INADEQUATE spectrum (recorded only for the crystalline form) provides the covalent connectivities between carbon atoms, indicated by red lines in Fig. 1d. The HETCOR spectrum (Fig. 1e)

Comparison of the structures determined via X-ray diffraction and NMR crystallography
The crystal structure determined using single-crystal X-ray diffraction ( Supplementary Fig. 1) was compared to the structure obtained through NMR crystallography. The superposition of the two structures is shown in Supplementary Fig. 2. The two structures were found to be highly similar except for the conformation of the bicyclo ring (on the left of Supplementary Fig. 2).
Supplementary Figure 1. Positional uncertainty of the X-ray determined structure of AZD5718. ORTEP plot of the heavy atom ADP tensors for the crystal structure of 1 determined using single crystal X-ray diffraction, drawn at the 90% probability level.
Supplementary Figure 2. Similarity between the XRD and NMR crystallography structures.
Comparison between the structure of 1 determined using X-ray diffraction (red) and NMR crystallography (blue).

Simulated spectra of AZD5718 amorphous MD simulations with different water contents
The simulated spectrum for each water content was computed by summing Lorentzian functions centred at the predicted shifts, and with a width of 0.3 ppm. The parameters for the conversion from shielding to shift were extracted by comparing the 4.0% water simulated spectrum as described in section 1.8, and were applied to all simulations of different water contents. The spectra were normalised such that their maximum is one. Although the experimental peak observed at 11.8 ppm was not observed in the simulated spectra, a larger population of the shifts above 11 ppm was observed with increasing water content ( Supplementary Fig. 3).
Supplementary Figure 3. Effect of the water content on simulated 1 H NMR spectrum. a Simulated 1 H NMR spectra of crystalline and amorphous AZD5718. b Close-up view of the spectra in the region between 10 and 14 ppm. 15