The Shape of Native Plant Cellulose Microfibrils

Determining the shape of plant cellulose microfibrils is critical for understanding plant cell wall molecular architecture and conversion of cellulose into biofuels. Only recently has it been determined that these cellulose microfibrils are composed of 18 cellulose chains rather than 36 polymers arranged in a diamond-shaped pattern. This study uses density functional theory calculations to model three possible habits for the 18-chain microfibril and compares the calculated energies, structures, 13C NMR chemical shifts and WAXS diffractograms of each to evaluate which shape is most probable. Each model is capable of reproducing experimentally-observed data to some extent, but based on relative theoretical energies and reasonable reproduction of all variables considered, a microfibril based on 5 layers in a 34443 arrangement is predicted to be the most probable. A habit based on a 234432 arrangement is slightly less favored, and a 6 × 3 arrangement is considered improbable.

Determining the shape of plant cellulose microfibrils is critical for understanding plant cell wall molecular architecture and conversion of cellulose into biofuels. Only recently has it been determined that these cellulose microfibrils are composed of 18 cellulose chains rather than 36 polymers arranged in a diamond-shaped pattern. This study uses density functional theory calculations to model three possible habits for the 18-chain microfibril and compares the calculated energies, structures, 13 C NMR chemical shifts and WAXS diffractograms of each to evaluate which shape is most probable. Each model is capable of reproducing experimentally-observed data to some extent, but based on relative theoretical energies and reasonable reproduction of all variables considered, a microfibril based on 5 layers in a 34443 arrangement is predicted to be the most probable. A habit based on a 234432 arrangement is slightly less favored, and a 6 × 3 arrangement is considered improbable.
Cellulose is one of the most important materials on Earth. As the foundation of plant cell walls, it has a critical role in the biosphere. Because it has evolved to provide strength and resist biodegradation, cellulose is commonly used in paper, clothing, and myriad materials. Efforts to convert cellulose into fermentable compounds have become commercialized recently in order to produce ethanol from recyclable feedstocks that minimize impact on the food supply.
Landmark papers by Nishiyama and coworkers 1-3 utilized X-ray and neutron diffraction to determine the crystalline structure of cellulose. This structure was used as the basis for density functional theory (DFT) calculations that subsequently reproduced the infrared, Raman, NMR and sum-frequency generation spectra of cellulose 4,5 . However, much of the research on cellulose has been performed on bacterial and/or plant-extracted cellulose neither of which represents the state of plant cellulose in muro 6,7 . Consequently, recent studies on non-extracted cellulose have been critical in helping to define the native state of cellulose within plant cell walls [8][9][10] .
Although the atomic structure of cellulose has been characterized, an outstanding question is the shape of cellulose microfibrils in plant cell walls. For decades, the cellulose microfibril (CMF) was thought to be comprised of 36 chains of cellulose polymers (i.e., 1,4 β-linked glucan monomers) based on the observation of "rosettes" formed by the cellulose synthase complex (CSC) 11 . Furthermore, a diamond-shaped habit based upon cellulose polymer layers arranged in a 1 × 2 × 3 × 4 × 5 × 6 × 5 × 4 × 3 × 2 × 1 pattern was assumed to be the habit of this 36-polymer CMF (e.g., Matthews et al., 2012). This view of the CMF was challenged by papers based on wide-angle X-ray scattering (WAXS) 12,13 . Based on diameter arguments, the 36-polymer CMF was judged to be too large, so the CMF would have to be smaller. This conclusion has been justified recently by the work of Nixon et al. 14 , Hill et al. 15 and Venu et al. 16 who have demonstrated that the CSC rosette is most consistent with a hexamer of trimers. Assuming only one cellulose polymer can be extruded from each of the CesA enzymes in the CSC [17][18][19] , the CMF should be made of 18 cellulose polymer chains rather than 36.
The difference between 18 and 36 polymers in the fundamental plant CMF will be reflected in the habit of the CMF as well as the size. The shape of the CMF is far from purely academic because the surfaces displayed by the CMF will depend upon this arrangement of cellulose chains. In turn, the interactions of other plant cell wall components (i.e., water, hemicellulose, pectin, lignin) will depend upon the nature of the exposed CMF surfaces. Biodegradation mechanisms of the cellulose will also depend heavily upon the nature of the CMF surfaces present. Hence, this study sought to determine the most probably arrangement of cellulose chains within the CMF in order to use this crystal habit as a basis for future studies of plant cell wall architecture, mechanics and biochemistry.
Based upon success in modeling cellulose atomic structure with density functional theory (DFT) calculations 4,20,21 , we chose to use a similar methodology 4 to predict the structure, energetics and spectroscopic properties of finite CMFs with various arrangements of 18 cellulose chains. The periodic DFT calculations in VASP were carried out in order to obtain atomic structures that could be used for comparison against observed X-ray and neutron diffraction studies 1,2 . Furthermore, a key prediction is the relative energies of each CMF habit. Although one cannot expect CMFs formed in plant cell walls to necessarily be at thermodynamic equilibrium, Iβ cellulose is known to be thermodynamically stable and only slightly higher in Gibbs free energy than the extracted cellulose II and III forms 22 . Our assumption is that it does not make biochemical sense for plants to produce a less thermodynamically stable form both in terms of assembly kinetics and in regards to ultimate strength and resistance to biodegradation.
To supplement the atomic structure and thermodynamic criteria, this study also calculates the WAXS diffractograms and the 13 C NMR chemical shifts (δ 13 C) of the model CMFs. The WAXS diffractograms were simulated using the CMF models taken from the structures as energy-minimized in VASP using the program WAXSiS 23,24 . δ 13 C values were calculated with Gaussian 09 25 using the same model structures terminated after 4 glucan monomers along the c-axis with methyl groups to avoid unsatisfied valence at the terminal O atoms 8 and following methods used in previous studies 26 .

Results and Discussion
Three arrangements of the 18 cellulose polymers were created based upon the criteria that the CMF should attempt to minimize surface area, be reasonably symmetric and maximize the van der Waals forces between layers and H-bonding within layers. These three crystal habits are shown in Fig. 1. The model CMF habits have the following arrangements of cellulose polymers 6 × 3, 234432, and 34443. (A 3 × 6 structure is possible but the diameter of this shape is ≈5 nm -significantly wider than the 3 to 3.8 nm measured 12,13 . The first two have been suggested previously as shapes of the CMF (see ref. 13 ), and the third is based upon a structure created for coarse-grained simulations of CMFs by Vin Crespi (The Pennsylvania State University).
Comparison of the model atomic structures with the parameters reported for Iβ cellulose in Nishiyama et al. 1,2 reveals that all three CMFs accurately reproduce observed interatomic distances and torsion angles (Tables 1 and 2). Structural differences between the center and origin chains of Iβ cellulose are larger than the differences among the three CMF habits. Significant discrepancies exist between the model and experimentally-reported center chain H-bond distances and angles (Table 1), but we note that the model structures produce vibrational frequencies in agreement with observation 4,5 and better agreement than the original Nishiyama et al. 1 structure with observed δ 13 C values 4 . Because the same model and method reproduces the reported structure of the origin chain, we conclude that the discrepancy is not the result of inaccuracies in the model but instead originate from the difficulties arising from predicting H atom positions in cellulose 1 ) and χ′ (C4-C5-C6-O6) torsion angles calculated are in reasonable agreement with the experimental values (Table 2; refs 1,2 ). There is no model that performs significantly better with respect to reproducing structural parameters, so we conclude that crystal habit has a minimal effect on these structural parameters.
Our second criterion for evaluating the probability of forming each CMF habit is the potential energy calculated for each. The periodic DFT calculations should better represent actual CMFs in that they are continuous rather than truncated after four glucan monomers. However, DFT has notable issues with estimating van der Waals forces 27 , so although the D3 dispersion correction of Grimme et al. 28 was employed, this empirically-derived estimate is subject to a significant level of uncertainty. In order to test how robust the DFT-predicted energies might be, single-point energy calculations were also performed on the tetramer CMFs in Gaussian 09 25 using the Perdew-Wang 29 , M05 30 , and B3LYP 31,32 exchange-correlation functions in both vacuum and aqueous solution conditions (IEF-PCM) 33 . Table 3 lists the energies for each model and method. Both the VASP and Gaussian 09 results argue against the 6 × 3 model because it is higher in energy than the other two options. The +276 to +371 kJ discrepancy is large, but when normalized on a per glucan basis, the difference is 4 to 5 kJ/mol/glucan. However, this small energy difference is still not negligible, because an energy difference of 4 kJ/mol between two states would result in a Boltzmann distribution ratio of approximately 5 between the two states. The VASP energies indicate that the 234432 model is most stable whereas the Gaussian 09 calculations predict that the 34443 model is lower in energy in vacuum and aqueous solution. In the former case, the difference is less than 1 kJ/mol/glucan which is well within computational accuracy. Even the +4 kJ/mol/glucan difference favoring 34443 from Gaussian 09 is relatively small, so we cannot definitively distinguish between 234432 and 34443 based on energetics.
High-resolution 13 C NMR spectra of cellulose in vivo have been reported recently 8 that can be used as a basis for evaluating the proposed CMF habits. Table 4 compares observed and calculated values for all types of C atoms within the center and origin chains. For "interior" C atoms, all three model CMFs result in good agreement with observation with the 34443 model having a slightly better fit based on the root-mean-squared (RMSE) and maximum errors (MaxE). 13 C NMR spectra of cellulose have observed C4, C5 and C6 peak doublets that have been ascribed to the changes in chemical environment and/or structure that occur on the surface of cellulose compared to the interior (see Harris et al. 9 , and references therein; Wang et al. 8 ). This interior to exterior or ordered to disordered shift is especially pronounced for C4 which changes from 88.5 to 91.5 ppm on the interior (iC4) to 84 to 86 ppm on the surface (sC4). A similar shift may occur for C6 from iC6 of 66 ppm to sC6 of 62 ppm, but the sC6 peak is harder to resolve due to overlap with other plant cell wall polymers (Harris et al. 9 ). Our results (Supplementary Table 1) indicate that surficial δ 13 C4 values (85.4 ± 0.9-87.5 ± 2.4 ppm) can match both the "iC4" and "sC4" peaks in Harris et al. 9 , and the surficial δ 13 C4 values are slightly higher than the interior δ 13 C4 values   Rel. E/glucan 0 3.2 3.1 Table 3. Calculated energies of 6 × 3, 234432 and 34443 models with various methods (kJ/mol). Rel. E: relative potential energy, Rel. E/glucan: relative potential energy normalized on a per glucan unit base. The lowest energy obtained from each method was set to 0 kJ/mol. Wang et al. 36 have found that some interior glucan chains in CMFs are more than one chain away from the nearest surface chains. Hence, the CMF of primary cell wall must have two types of interior cellulose chains: a core fraction not directly contacting with the surface and a bound fraction directly contacting with the surface. As shown in Supplementary Fig. 1, only the 34443 model has the core fraction, because chain A in 34443 is an interior chain, based on the calculated NMR chemical shifts as shown in Supplementary Table 2. 234432 and 333333 models do not have the core fraction. In addition, since chain A in 34443 is an interior chain, the ratio of surface chain and interior chain (s:i) for model 34443 is ~1.57, which is in good agreement with the s:i ratios of CMF in primary cell walls that Wang et al. found using solid-state NMR techniques. Therefore, based on the core domain and s:i ratio, the 34443 model is preferred.
WAXS data were instrumental in questioning the original 36-chain CMF model, so the models in this study were used to generate synthetic diffractograms for direct comparison with experimental observations. The theoretical diffraction patterns of the three CMF habits had peaks at ~10 to 11 nm −1 and 16 nm −1 similar to the experimental data for Arabidopsis and Poplar (Fig. 2). The 34443 model is distinct however in not producing a WAXS peak near 4 nm −1 (Fig. 2A). In addition, the 4 nm −1 peak is independent of the length of the cellulose microfibril models applied in WAXSiS calculations (as shown in Supplementary Fig. 2). It is also independent of the surrounding environment (water or vacuum) of CMF, as shown in Supplementary Fig. 4. Interestingly, when comparing to experimental WAXS results, we found that the WAXS diffractogram of Poplar cell walls demonstrated a weak peak around 4 nm −1 , whereas that of Arabidopsis did not show any peaks around that area. This is probably due to the difference in average electron density in real sample and inter-microfibril correlation. However, due to the complexity of plant cell walls, it is challenging to compare the simulation results with the experiments in planta especially in the q-range approaching the size of microfibrils. The peak around 4 nm −1 pointed us a possible way to differentiate CMF structures. However, more experiment and simulation work would be expected to further explore the origin of the peak.

Conclusion
In conclusion, DFT calculations on the structures, energies and spectroscopic parameters of three proposed CMF habits all predict accurate structures and δ 13 C NMR values. Based comparison of sC4 and sC6 calculated versus observed δ 13 C, the 34443 model is slightly favored. Energetic considerations discount the 6 × 3 model as least favorable with periodic and molecular cluster calculations providing indistinguishable results on the 234432 and 34443 models; however, the calculated energy results favor 34443. WAXS diffractograms of the 34443 model are more consistent with observed patterns. We conclude the preponderance of evidence suggests that the 34443 habit is most probable because it cannot be eliminated from consideration based on comparisons to structural and 13 C NMR data nor energy estimations via DFT. The most distinguishing feature is the lack of a WAXS peak near 4 nm −1 which is predicted to exist for the other two models considered.

Methods
Density functional theory calculations in this study were performed with the Vienna Ab-initio Simulation Package (VASP [37][38][39][40] and Gaussian 09 25 codes. Initial atomic structures were based upon the experimental model of Nishiyama et al. 1 as supplied by Yoshi Nishiyama. The unit cell of cellulose was doubled to include 4 glucan units along the c-dimension, and 18 polymer CMFs were constructed in Materials Studio 2016. The simulation cell was expanded in the x-and y-dimensions to allow for space between the images of the CMFs in order to minimize any self-interactions. The model structures were converted into VASP input files (POSCAR) via a perl script written by Andrei V. Bandura (Saint Petersburg State University, Russia). All atomic positions were relaxed during energy minimizations using the VASP code. The simulation cell lattice parameters were also relaxed until the c-dimension converged to a stable state in order to the cellulose chains to stretch or contract. After convergence of the chain lengths, the lattice parameters were then held constant in order to prevent artificial aggregation of the CMFs with their periodic images (i.e., contraction of the a and b cell dimensions until the CMF would sense images of itself.) NMR calculations used the multi-standard method of 41 with tetramethylsilane (TMS) and methanol as the primary and secondary standards 4,26,34,42,43 in order to increase accuracy in reproducing experimental values. The modified Perdue-Wang exchange-correlation functional mPW1PW91 44 along with the 6-31 G(d) basis set 45 were used in the calculation of isotropic chemical shieldings for all nuclei. Only the added terminal methyl groups were relaxed in Gaussian 09 while all other atoms were fixed in the positions determined in the periodic energy minimizations performed with VASP. The 18-chain CMF structures (each chain contains 4 glucose units) were solvated in a box of TIP3P water 46 followed the protocol detailed out in previous work 35 using the Impact module of Maestro 47 , followed by 1000 steps of conjugate gradient minimization performed before 10 ps molecular dynamics simulations at 298 K with a 1 fs time step in the NVT ensemble using the OPLS_2005 force field 48 keeping the atomic positions of cellulose molecules fixed. Only H 2 O molecules within 3 Å of the cellulose clusters were included for NMR calculations as it was found that H 2 O molecules further than 3 Å from the clusters had negligible effects on the calculated δ 13 C. An empirical correction of 49.5 ppm 43 was used for the difference between the δ 13 C in methanol and TMS commonly used as an experimental 13 C NMR standard 41 . This gives an isotropic chemical shielding of 193.0 ppm. To compute the δ 13 C for any C nucleus i in cellulose, we used: δ 13 Ci = 193.0 ppm -δ 13 Ci.
Small-and Wide-Angle X-ray Scattering curves of CMFs in solution were calculated based on explicit-solvent all-atom molecular dynamics (MD) simulations using WAXSiS 23,24 .
The 18-chain CMF structures were obtained from the periodic energy minimizations performed with VASP. Each chain contained 19 glucose residues Each CMF structure was solvated in a box of TIP3P water so that the CMF was surrounded in all directions by at least 20 Å. Short (50 ps) MD simulations were performed using AMBER 49 and the Glycam06 carbohydrate force field 50 at 300 K, at 1 bar, in the NPT ensemble, with 2 fs time step, 10 Å non-bonded interaction cutoff, and SHAKE-constrained hydrogen bonds. The positions of CMF were constrained using a force constant of 250 kcal/mol/Å 2 . Simulation frames are written every 0.5 ps so that the solvent configurations are uncorrelated 23 . From the MD trajectories, the theoretical scattering curves were then calculated using WAXSiS webserver. Two buffer subtraction methods have been used to calculate the scattering intensity: (i) total buffer intensity is subtracted, I(q) = I sample (q) − I buffer (q); (ii) the buffer intensity is reduced by the volume fraction v of the solute, I(q) = I sample (q) − (1 − v) I buffer (q) 23 . The two subtraction methods only slightly differ at small angles. As shown in Supplementary Fig. 3, total buffer subtraction method predicted 200 peaks at 15.8-16 nm −1 , and buffer scattering reduced by solute volume resulted in the 200 peaks at 16.5 nm −1 , very close to the widely accepted value for 200 of cellulose I (around 16 nm −1 ) 13,51-53 . But they highly differ at wide angles around the water scattering peak (20 nm −1 ) 23 . As shown in Supplementary Fig. 3, total buffer subtraction method resulted in negative intensities around 20 nm −1 . However, both predicted the peak around 4 nm −1 . The CRYSOL 54 program was also used for calculation of theoretical diffraction patterns of the three CMF habits in water (with solvent density of 0.334 e/Å 3 ) and in vacuum based on the atomic coordinate of the CMF structures obtained from the periodic energy minimizations performed with VASP.
The Poplar tension wood sample was obtained as described in Sawada et al. 55 . The bottom 1 cm section of a seven week old stem from Arabidopsis thaliana colD was dried at 80 °C overnight before measurement. The XRD profiles were recorded as described in Sawada et al. 55 .