## Introduction

Nuclear magnetic resonance (NMR) spectroscopy has revolutionized organic and biological chemistry fields, owing to its ability to provide precise structural detail through investigation of 1H and 13C spectra. Assignments of these spectra rely on 50+ years of comprehensive and detailed data, many of which have been cataloged in guides from Sadtler1 and Aldrich2 and subsequently in databases, such as the AIST Spectral Database for Organic Compounds SDBS3.

For inorganic species, there are far fewer resources, and through the local spectroscopy data infrastructure (LSDI), we seek to develop a database of both known and predicted NMR spectra for less-commonly studied nuclei, beginning with 29Si. The data infrastructure serves as a platform to compute 29Si NMR tensors and generate model spectra by using crystalline compounds in The Materials Project (MP) database. 29Si is attractive, because it is a nuclear spin, I, 1/2 species, found at moderate natural abundance (4.68%)4, and studied as a constituent in minerals, zeolites, and amorphous glasses.

X-ray diffraction (XRD) has been the primary tool for determining the structure of crystalline materials, for nearly a century. Determination of lattice parameters, symmetry, and coordinates of moderate- to high-Z species in the lattice is relatively straightforward, making XRD a powerful and versatile analytical tool. As the demand for accuracy of atomic coordinates increases, structures proposed based only on XRD have been shown to lack accuracy for lighter elements, such as H5,6,7,8. In this case, other experimental techniques like neutron diffraction and recently NMR have been employed to lend accuracy. This NMR refinement of structures is termed “NMR crystallography”7,9,10,11,12. Solid-state NMR is also a powerful tool to characterize the local environments of unique sites within a crystalline material, where alterations in the local environment can shift NMR resonances: small distortions to bond lengths and angles can perturb spectra in ways that are manifested in information gleaned, especially in the solid state.

The exponential increase in computational power over the past two decades enables theoretical methods to scale across structure and chemistry more easily than experimental methods. In the field of solid-state NMR, however, most of the research utilizing computational methods are focused on a handful of structures at a time13,14,15. The potential of rapidly characterizing NMR properties based on a large computational database coupled with consistent standards is still underestimated. Thus, within certain approximations necessary for tractable simulations, a dataset of simulated NMR tensors and interactive tools to visualize and explore NMR spectra has the potential to drastically increase the accuracy and efficiency of the study of solid-state materials. The LSDI is constructed with plane wave basis density functional theory (DFT) calculations using two popular codes: the Vienna Ab initio Simulation Package (VASP) and Cambridge serial total energy package (CASTEP). In this study, we seek to demonstrate that both packages are effective at calculation of NMR shielding tensors (σ) for 29Si. The isotropic chemical shift is the most familiar experimental NMR parameter to researchers (δiso); however, other lesser explored individual tensor elements from the solid-state lineshape add critical information about the local environment. Prediction of the full diagonalized tensor is useful for planning experiments, both under static solid-state or magic-angle-spinning (MAS) NMR conditions, that will enable accurate extraction of these values. As we have shown in a separate 13C study6, determination of the chemical shift tensor values enabled refinement of the H positions in a polycrystalline sample. Possessing catalogs of tensor values will ultimately accelerate “NMR crystallography”—to refine the local environment around nuclei being probed during NMR experiments.

Furthermore, this study illustrates an important aspect of cataloging experimental data and comparing these to computations. As experimental measurements improve over time, there are often improved tools to provide more accurate interpretation of data. In this case, by examining a large set of tensors, it has been possible to identify assignment errors in tensor elements arising from the use of Haeberlen notation, described below. In addition, systematic differences between CASTEP and VASP are found, which are critical when reporting the full shielding (or shift) tensor, that are not evident when considering only the isotropic values, σiso and δiso.

### NMR definitions

There is a tendency to use the terms “chemical shielding” and “chemical shift” interchangeably, even though these are different but related quantities. We set forth definitions for shielding (σ) first, followed by nomenclature and expressions using chemical shift (δ).

The NMR chemical shielding tensor describes the interaction between an externally applied magnetic field and the shielding effect of electrons, which leads to an induced magnetic field. Nuclear spins sense both the external field and the induced field, expressed by16

$${\mathbf{B}}_{{\mathbf{ind}}{\mathbf{.}}}\left( {\mathbf{r}} \right) = - \vec \sigma \left( {\mathbf{r}} \right){\mathbf{B}}_{\mathbf{0}}$$
(1)

$$\vec \sigma \left( {\mathbf{r}} \right)$$ is a second rank tensor, the chemical shielding tensor, and it consists of symmetric and asymmetric contributions. The symmetric contribution can be diagonalized to yield three principal components of the shielding tensor, referred to in Haeberlen notation as σXX, σYY, and σZZ.

$$\left( {\begin{array}{*{20}{c}} {\sigma _{{XX}}} & 0 & 0 \\ 0 & {\sigma _{{YY}}} & 0 \\ 0 & 0 & {\sigma _{{ZZ}}} \end{array}} \right)$$
(2)

Isotropic shielding σiso is defined as the numerical average of the principal components.

$$\sigma _{{\mathrm{iso}}} = \frac{{\sigma _{{XX}} + \sigma _{{YY}} + \sigma _{{ZZ}}}}{3}$$
(3)

Individual tensor elements have particular utility to help communicate details of the full chemical shielding anisotropy (CSA) lineshape. At issue is how best to report these tensors, since there are multiple conventions, including: “Mehring” convention17, Herzfeld Berger18, Haeberlen19, and the “Maryland” convention20. The Haeberlen convention is the one used by many researchers and importantly by computational programs that use these conventions to depict spectra, including the popular Dmfit21 and SIMPSON22 programs.

Here, we report many of the comparisons between experiment and computation using the popular Harberlen17 convention, in part, because most literature uses this system for reporting the full chemical shielding (or shift) tensor. In Haeberlen, σXX, σYY and σZZ are defined based on their distance from the isotropic shielding, σiso:

$$\left| {\sigma _{{ZZ}} - \sigma _{{\mathrm{iso}}}} \right| \ge \left| {\sigma _{{XX}} - \sigma _{{\mathrm{iso}}}} \right| \ge \left| {\sigma _{{YY}} - \sigma _{{\mathrm{iso}}}} \right|$$
(4)

There are additional parameters that are often reported in the Haeberlen system reflective of the solid-state CSA lineshape. These are the shielding anisotropy, also called “reduced anisotropy”, ζσ and “asymmetry parameter” (ηCSA), expressed as follows:

$$\zeta _\sigma = \sigma _{{ZZ}} - \sigma _{{\mathrm{iso}}}$$
(5)

(Please note: in the Haeberlen convention, there are two methods for reporting the anisotropy of the CSA: shielding anisotropy Δσ and reduced shielding anisotropy ζσ.)

While Eq. (5) expresses the algebraic definition for this quantity, the reduced anisotropy can be visualized in terms of the relative location of the most intense point in the static lineshape—to the right or left—of the isotropic shielding, σiso.

The overall shape of the line is expressed by the asymmetry parameter, where ζσ appears in the denominator.

$$\eta _{{\mathrm{CSA}}} = \frac{{\sigma _{{YY}} - \sigma _{{XX}}}}{{\sigma _{{ZZ}} - \sigma _{{\mathrm{iso}}}}}$$
(6)

The value ranges from 0 to 1, irrespective of the sign of ζσ because any change from positive to negative reduced anisotropy is canceled by a similar sign change in the numerator.

It is worthwhile to note, in the “Mehring” convention17 the diagonalized shielding tensor elements are expressed as σ11, σ22, and σ33. Unlike the Haeberlen labels, these are assigned based on their absolute order in frequency (and are not arranged relative to the isotropic shielding, as seen in Eq. (4) above):

$$\sigma _{33} \ge \sigma _{22} \ge \sigma _{11}$$
(7)

This latter point will have important consequences for data accuracy and correlations between experiment and computation, as we show below.

## Results and discussion

The solid-state NMR parameter known with the highest precision is the experimentally measured isotropic chemical shift, δiso. This value is the average of all three principal components of the diagonalized tensor. Small inaccuracies in the principal components are partially averaged when considered in their expression, as the average: (δXX + δYY + δZZ)/3. As the most frequently reported (experimental) parameter, the comparison between experiment and computation has particular significance for researchers.

In the computations we extract chemical shielding tensors. The calculated parameters are compared with the 42 sets of experimentally reported (chemical shift) tensors as a benchmarking set, and the reference isotropic chemical shift is obtained by extrapolation of a linear regression model23 described in detail in Supplementary Information, Section III20.

Shown in Fig. 1a is the linear relationship between the CASTEP-computed 29Si isotropic chemical shielding, σiso, and the experimentally measured 29Si isotropic shift, δiso. Figure 1b is a similar plot of VASP computed σiso versus experimental values δiso. Each data point in the plot represents a unique Si site in a crystalline material. The resultant value for reference isotropic chemical shielding within CASTEP is σreference = 316.26 ppm, and the slope of the correlation plot is −1.12. The resultant value for reference isotropic chemical shielding within VASP is σreference = 528.18 ppm, and the slope is +1.15. There is a very high degree of correlation, with an R2 value of 0.99 and RMSE of 1.39 ppm for CASTEP, and R2 of 0.98 and RMSE of 1.45 ppm for VASP. This strong linear correlation demonstrates the ability of DFT to compute chemical shielding with sufficient precision to match experimentally determined chemical shifts for inorganic materials. A high degree of correlation in this benchmarking set gives us confidence that additional crystalline materials will also have accurate prediction of the 29Si chemical shielding/shift. Additionally, σiso of the same data set was predicted by VASP. Figure 1c compares VASP and CASTEP computed σiso values demonstrating very good agreement between VASP and CASTEP that shows both platforms perform well, modeling the 29Si isotropic chemical shielding. These data are all collected in tables in the Supplementary Information, Section I, Supplementary Tables 13.

Beyond isotropic shift, the additional two algebraic expressions (ζδ and ηCSA) can be directly linked to the individual tensor elements that express the shape of the experimental lineshape, whether static NMR or a manifold of spinning sidebands under MAS NMR. Figure 2 is a schematic illustrating the relationship of principal components of the chemical shift tensor, as well as δiso and ζδ for a lineshape with a representative ηCSA value of 0.4.

### Challenges for cataloging full tensors

Since most of the benchmark compounds have reported the Haeberlen quantities of “asymmetry parameter”, ηCSA, and reduced anisotropy of CSA (ζδ), we examine the relationship between experimentally measured values (largely from past literature) and computations below.

We have reconciled past experimental reports of the 29Si reduced anisotropy of the chemical shift (ζδ) and depict our findings in the following set of figures. The comparison between experimentally reported reduced anisotropy and the computed values from CASTEP (or VASP) reveals issues faced when cataloging data. Figure 3 depicts a comparison of 42 experimentally reported reduced anisotropies from the literature with the corresponding values predicted by CASTEP. While a high degree of correlation is found for most of the data, a number of significant “outliers” are identified. Supplementary Fig. 3 shows excellent agreement between the computed values for reduced anisotropy for CASTEP versus VASP, giving us confidence that both programs are able to predict similar values of these tensor parameters for crystalline structures. In general, the outliers are points for which the assignment of experimentally obtained δZZ (and hence, δXX as well) may be incorrect, as we illustrate.

The reduced anisotropy of the CSA (ζδ) in the Haeberlen system defines the lineshape in terms of one “extreme edge” of the static powder pattern (ζδ = δZZδiso), explicitly yielding that one specific element of the tensor. This is the shoulder furthest from the isotropic chemical shift, which poses an observational challenge when examining some experimental spectra, as illustrated by Fig. 4. For one manifestation of the lineshape, usually an ηCSA value less than about 0.7 (such as that shown in the inset image), δZZ is unambiguous as marked. However, for lineshapes with large values of ηCSA (e.g., approaching 1.0) and for MAS NMR with few spinning side bands, the researcher must assign that shoulder to one side or another based on sparse data (as illustrated schematically in the Supplementary Information, Section V). When there is sparse data, poor-signal-to-noise ratios in the experimental spectrum, or when there is a truncation of one shoulder due to radio-frequency pulse imperfections24, the wrong value for δZZ may be assigned—importantly, to the incorrect “side” of the lineshape.

A consequence of such inaccurate assignments is to lead to incorrect expressions for both ζδ and ηCSA. We have also found that ηCSA tends to be poorly determined by observational analysis of lineshapes. This “asymmetry parameter, ηCSA” contains the reduced anisotropy, ζδ, in its denominator, as well as δXX and δYY in the numerator. Consequently, a mis-assignment of two of these tensor elements can cause this parameter to be unstable, exhibiting large fluctuations with small deviations in the direct tensor elements, resulting in a significant lack of correlation between computation and experiment as depicted in Fig. 5. (Similar to what was seen for ζδ, there is a very good correlation between VASP and CASTEP computed values for the asymmetry parameter, shown in Supplementary Fig. 4.)

### The effect of convention on individual tensor elements

In light of the errors revealed in the expressions above, a strong argument can be made for reporting the individual tensor elements, and departing from the Haeberlen convention. One of the important opportunities afforded by the LSDI database is the ability to discover such systematic errors, by comparing a large number of datasets. Using the three equations from the Haeberlen convention and solving for the three unknowns (σXX, σYY and σZZ), a correlation plot between computed and experimentally reported δ values is shown in Fig. 6a, with tensor elements clustered by symbol (and color). The outliers are identified by name. Shifting to a different definition for chemical shift tensors, referred to as the “Mehring” convention17, where σ11, σ22 and σ33 are the three counterparts, organized in terms of high-frequency to low-frequency for any lineshape, the algebraic solutions for the experimentally reported values become reconciled, as shown in Fig. 6b. (A depiction of the distribution of residuals for each of the principal components after linear rescaling can be found in Section IX of the Supplementary Information.)

We can see that the individual tensor elements, defined in terms of their frequency using the Mehring convention, are better correlated between experiment and computation, and reporting these reduces the inaccuracies inherent in the algebraic expressions used to describe the lineshape.

It is important to note that the shielding (σ) and the chemical shift (δ) should have a negative correlation with respect to one another (see Eq. (S3)). One finding in creation of this catalog is the inverse correlation of tensor elements between CASTEP and VASP, which is critical to any understanding derived from comparison of experiment and computation. Both CASTEP and VASP compute chemical shielding, where the individual tensor elements are cataloged in the LSDI based on a convention (ref. 17, IUPAC 2008) (Eq. (7)), namely that the tensor elements are ordered numerically from largest to smallest. A case study is presented in the Supplementary Information, Section VI to illustrate this systematic difference in the shielding tensor elements, that is corrected when producing the individual chemical shift tensor elements. The LSDI catalog will ultimately contain both computed chemical shielding and corrected chemical shift full tensors. Supplementary Figs. 7 and 8 depict this reconciliation between the programs, and Fig. S6 is a matrix of plots CASTEP and VASP principal shielding components showing the mis-correlated components. As a check, we have used the TensorView program25 to render a graphical depiction of the shielding surface ovaloid superimposed onto a Q3 silicon site in sodium disilicate. In the figure, the tensors’ graphical depiction for VASP versus CASTEP is mathematically perpendicular to one another. The assigned σ33 CASTEP ovaloid is oriented as expected with the σ33 component along the single C3 rotation axis of the Q3 silicate site. The VASP schematic shows that σ33 (from VASP) is mis-identified to be at 90° from the bond along which it should lie.

### Opportunities for applications of the “LSDI” catalog

CASTEP and VASP have particular strengths in the assignment of tensor elements, which will form the basis of the LSDI catalog. The LSDI has already computed over 10,000 unique Si-sites for compounds in the MP using VASP. This continually growing data set, easily accessible application programming interface (API) and collection of software tools is established as a community resource to enable easier in-silico experimentation with solid-state NMR. Having such a catalog of shift tensors allows prediction of both static and MAS lineshapes for solid-state NMR, which will aid in accurate simulation of the full lineshape and all three tensor elements. Furthermore, as we depict schematically in Fig. 7, the ability to plan experiments (i.e., to select an ideal MAS spinning frequency, such as shown in Fig. 7c versus that in Fig. 7b) in order to accurately map out the tensor values, especially of δ22, is a consequence of possessing such data.

The utility of this catalog can be demonstrated by considering the characterization of silicates by 29Si solid-state NMR spectra, specifically assigning resonances to Qn sites, a notation that reflects the local silicon environment and symmetry. Q4 has four equivalent Si–O–X bonds, and X is an element that can include Si, often Si–O–Si in a network, or a species such as H (forming Si(OH)4). Q3 has three equivalent Si–O–X linkages and one unique Si–O–Y substituent (where in this case, Y could be a different substituent, or it could simply reflect a longer Si–O bond), and so on. Each of the Qn sites is associated with a typical 29Si chemical shift range. However, what if you have a sample with an atypical substituent? The LSDI catalog permits a comparison of isotropic chemical shielding values for >5000 silicate structures.

In Fig. 8, a “box plot” of the VASP-computed σiso parameters from the benchmarking set shows the range of isotropic chemical shielding values predicted for different Qn sites in silicates, with a variety of substituents. The trend as n increases is seen, as well as the range of computed values, spanning 40–45 ppm. A number of outliers are also found. It is possible for practitioners of 29Si NMR to compare their spectra to these values in order to develop chemical insights into trends for particular bonding environments or changes of local site symmetry. What is especially helpful from such a plot is the ability to assign the chemical shifts of “less common” sites, not based on the isotropic value alone (since these ranges overlap strongly), but through comparison to a range of compounds in the database with related chemical structures.

We have used 42 silicon sites as a benchmarking set to compare between CASTEP, VASP, and experimentally reported expressions regarding the solid-state 29Si NMR lineshapes. Through this examination, we have established a robust and systematic method for assigning the diagonalized chemical shift/shielding tensor values. Armed with confidence in this benchmarking set, over 10,000 29Si NMR shielding tensors will be publicly available via the LSDI portion of The MP. These tensors will be a guide to researchers when searching for 29Si NMR assignments, as well as a platform that can assist with experimental conditions, since the appearance of spectra can be anticipated prior to measurement.

Benchmarking also revealed an unexpected systematic difference between VASP and CASTEP, where σ11 and σ33 shielding elements were interchanged, owing to a sign difference between computed tensors. This sign error is corrected when using linear regression methods (to obtain chemical shift tensor values, δ), and the final chemical shift anisotropy lineshapes that are generated are consistent with experimental measurements—from both programs. Consequently, our data tables reflect these revised values. Thus, systematic comparison of NMR properties across various methodologies, including differing computational methods or codes, should be conducted in a chemical shift basis to eliminate representation deviations that could lead to systematic error.

Understandable “assignment errors” of δXX and δZZ tensor elements have been found in the literature, owing to difficulties with the Haeberlen notation and uncertainties as the lineshapes approach large asymmetry values (ηCSA) closer to 1. The benchmarking set permitted discovery of such errors, and the values are corrected in the LSDI database (and in the tables shown in the Supplementary Information). Consequently, the database will report all notation in the IUPAC recommended fashion using the Mehring convention of δ11, δ22, and δ33.

The possession of such a large dataset permits comparisons of the computed parameters across a large number of structures. When NMR practitioners use the LSDI dataset, they will be permitted to compare their experimental measurements to a variety of related structures, which will ultimately facilitate assignments of those spectra. This type of dataset can open the next era in solid-state NMR spectroscopy encompassing an informatics approach to experimental design.

## Methods

### Dataset

We have identified 29Si NMR of crystalline compounds to use as a benchmarking set, nearly all of which have been analyzed by solid-state magic angle spinning (MAS) NMR or static single crystal NMR (2). This set is comprised of 31 structures26,27,28,29,30, with 42 unique silicon sites primarily in minerals such as forsterite, wollastonite, zeolites, and quartz.

### Details of DFT computations

CASTEP has been shown to be very effective for calculations of isotropic chemical shifts for nuclei, such as 1H, 13C, 89Y, and 119Sn9,31,32,33,34 as well as diagonalized tensor values for 19F and 77Se35,36,37 in select systems. DFT calculations using CASTEP were performed within the Perdew–Burke–Enzerhof (PBE) generalized gradient approximation (GGA) formulation of the exchange-correlation for both geometry optimization and NMR calculations. “On the fly” generated ultra-soft pseudopotentials were used to approximate the interaction between the core electrons and the nuclei. Convergence tests of stress and chemical shift anisotropy over different cut-off energies and k-points has been performed on α-quartz. The results are depicted in Supplementary Information, Section II. The cut-off energy of the plane-wave basis set was 800 eV, and the separation of k-points in the reciprocal space was 0.025 1/Å.

DFT calculations were also performed using the projector-augmented wave (PAW) method38,39 as implemented in the VASP40,41,42 within the PBE GGA formulation of the exchange-correlation functional43. A cut-off for the plane waves of 520 eV is used and a uniform k-point density of ~1000/atom is employed. We note that the computational and convergence parameters were chosen in compliance with the settings used in the MP44 to enable direct comparisons with the large set of available MP data.

CASTEP and VASP both use the Gauge Including Projector Augmented Waves (GIPAW) method45 to reconstruct the core wavefunction and perform NMR calculations.

In this benchmarking set, we focus on species whose full CSA tensor has been reported. When possible crystalline structure coordinates accompanying the tensor values were used as the basis for DFT optimization and tensor calculation. When not explicitly specified, structures from the ICSD database were the starting point for geometry optimizations.

All the computationally obtained parameters were subsequently used in simulations of spectra using the lineshape-generating program, Dmfit21. Two models are used in the simulation: “CSA static” for static NMR lineshapes (CSA powder patterns), and “CSA MAS” for the NMR spectrum of the manifold of spinning sidebands found for a given MAS rotation frequency, νr. Since this rotation frequency is an easily adjustable parameter, it is straightforward to simulate multiple “spinning-sideband manifolds” that essentially map onto the static CSA-broadened lineshape.