Robust Training of Machine Learning Interatomic Potentials with Dimensionality Reduction and Stratified Sampling

Machine learning interatomic potentials (MLIPs) enable the accurate simulation of materials at larger sizes and time scales, and play increasingly important roles in the computational understanding and design of materials. However, MLIPs are only as accurate and robust as the data they are trained on. In this work, we present DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling as an approach to select a robust training set of structures from a large and complex configuration space. By applying DIRECT sampling on the Materials Project relaxation trajectories dataset with over one million structures and 89 elements, we develop an improved materials 3-body graph network (M3GNet) universal potential that extrapolate more reliably to unseen structures. We further show that molecular dynamics (MD) simulations with universal potentials such as M3GNet can be used in place of expensive \textit{ab initio} MD to rapidly create a large configuration space for target materials systems. Combined with DIRECT sampling, we develop a highly reliable moment tensor potential for Ti-H system without the need for iterative optimization. This work paves the way towards robust high throughput development of MLIPs across any compositional complexity.


Introduction
Machine learning interatomic potentials (MLIPs) have become an indispensable staple in the computational materials toolkit.3][4][5][6][7][8][9][10] While MLIPs generally exhibit much better accuracies in energies and forces compared to traditional IPs, 11,12 their key advantage is that they can be systematically fitted and improved in a semiautomated fashion for diverse structural and chemical spaces.By enabling accurate and efficient simulations over length and time scales much larger than those accessible by ab initio methods, MLIPs have provided new insights into a wide range of physicochemical processes.These include lithium diffusion in lithium superionic conductors and their interfaces, [13][14][15][16][17] dislocation behavior and ordering in multiple principal element alloys, 18,19 liquid-amorphous and amorphous-amorphous transitions in silicon, 20 and reaction mechanisms of molecule-molecule and molecule-surface scattering, 21,22 to name a few. 12][8][9][10] Graph deep learning models encode the elemental character of each atom using features with a fixed dimensionality, avoiding the combinatorial explosion in model complexity associated local environment descriptors with number of elements.Of particular relevance to this work is the Materials 3-body Graph Network (M3GNet) architecture, which combines many-body features of traditional IPs with those of flexible material graph representations.By training on the massive database of structural relaxations in the Materials Project, Chen and Ong 6 have developed a M3GNet universal potential (M3GNet-UP) for 89 elements of the periodic table and demonstrated its application in predicting structural and dynamical properties for diverse materials.
The critical challenge in developing a robust MLIP is generating a training dataset that can provide a good coverage of the structural/chemical space of the materials of interest (henceforth, referred to as the "configuration space").Typically, the configuration space is generated through domain expertise, comprising ground-state structures, relaxation trajectory snapshots, strained structures, ab initio MD (AIMD) structures, defect structures, etc. Ab initio calculations such as those based on density functional theory (DFT) are then performed on structures sampled from the configuration space to obtain accurate energies and forces as training data for MLIPs.
4][25][26][27][28] In this way, an MLIP is used to simulate the materials of interest, and generated structures that require extrapolation are added to refit the MLIP in an iterative fashion.The key advancement in AL is the efficient uncertainty evaluation of MLIP prediction on new structures without referring to the DFT PES, which greatly expands the search space and minimizes the cost of training structure augmentation.While AL has been undeniably effective in the construction of robust MLIPs, it can be inefficient for highly complex configuration spaces.For instance, a recent work by the authors to fit a moment tensor potential for the 7-element (Li 7/18 Sr 17/36 )(Ta 1/3 Nb 1/3 Zr 2/9 Sn 1/9 )O 3 complex concentrated perovskite required over 100 AL iterations. 29 ideal strategy should enable efficient generation and sampling of the configuration space prior to any DFT computations.One proposed approach is to bias MD simulations to sample ordered and disordered structures as an entropy maximization (EM) strategy 30,31 to sample a diverse feature space.For example, Montes De Oca Zapiain et al. 31 showed that an MLIP for tungsten trained with an EM set has much more consistent accuracies in energies for structures present in both EM set and domain expertise (DE) training set, while the MLIP trained with DE set performs significantly worse for EM set than for DE set.Another recently proposed high-throughput scheme generated four training sets for Mg, Si, W and AL by applying normally distributed random atom displacements together with isotropic and anisotropic lattice scaling to the respective non-diagonal supercells. 32,33The as-fitted MLIPs can accurately reproduce the force constant matrix of those crystalline systems.
In this work, we present a DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling strategy to generate robust training data for MLIPs for any chemical systems.We will first demonstrate the effectiveness of DIRECT sampling of 1.3 million structures in the Materials Project structural relaxation dataset 6,34,35 to fit an improved M3GNet universal potential (UP).Next, we will demonstrate how the M3GNet UP can be used to effectively generate configuration spaces for DIRECT sampling using the Ti-H model system, which is known to be highly challenging for reliable MD simulations.This work paves the way towards robust high throughput development of MLIPs across any compositional complexity.Figure 1 provides a workflow of the proposed DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling approach, which comprises five main steps:

DIRECT Workflow
1. Configuration space generation.A comprehensive configuration space of N structures for the system of interest is generated.This can be performed using commonly       Training a more reliable M3GNet universal potential

Developing an accurate MLIP for titanium hydrides
As illustrated in Figure 1 In this section, we demonstrate the capability of the DIRECT sampling approach combined with the M3GNet UP to construct reliable MLIPs.Here we have chosen the moment tensor potential (MTP) to study titanium hydrides (TiH n ), which are promising materials for hydrogen storage. 39Hydrogen is well-known to be highly diffusive in these systems, even at ambient temperatures, and a relatively short time step, e.g., ∼ 0.5 fs, is required for stable MD simulations.Therefore, this system provides a robust test for our proposed workflow.
Moreover, we note that these descriptor-based MLIPs are more computationally efficient to study particular chemistry due to their low model complexity relative to UPs.
Configuration space for Ti-H  A DIRECT sampling is applied to select

Discussion
In summary, we have demonstrated a robust DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling approach to generate training structures for MLIP develop-ment.We also demonstrated that MD simulations using the M3GNet universal potential can be used to generate an initial large configuration space for DIRECT sampling.In many cases, a satisfactory, stable MLIP can be obtained with DIRECT sampling without AL.
Even when AL is necessary to further fine-tune the MLIP, DIRECT sampling significantly reduces the number of AL cycles and the total number of DFT static calculations required -the most computationally expensive step in MLIP development.
In this work, we have used the final GCL output vector from a pre-trained M3GNet formation energy model as the structure encoder.We believe this to be a reasonable choice given that the M3GNet formation energy model has been trained on a diverse range of structures and chemistries.The final GCL output, therefore, encodes all relevant chemical information for energy prediction.To our knowledge, there are few other structure encoders that currently satisfy this requirement.
The training cost of the MLIP is controlled by two parameters -the number of clusters n and the number of samples per cluster k.For a given computational budget of M DFT static calculations, there can be several choices of n and k for a total configuration space of N structures.As a rule of thumb, one should bias towards having a large number of clusters n, i.e., n ≈ M , to ensure coverage of the extrema of the configuration space.However, k > 1 can be used to reduce the CPU and memory requirements for clustering when n = M is not feasible for large N and M .This sampling approach also enables an "interlacing" approach to building MLIPs.For instance, one can build an initial MLIP using k = 1, and increasing k if a higher resolution coverage of the configuration space is deemed necessary for an accurate MLIP.
Finally, we note that DIRECT sampling is agnostic to the chosen MLIP architecture.
Here, we have demonstrated its application via the training of an M3GNet universal potential with improved extrapolability and a reliable moment tensor potential for Ti-H.DIRECT sampling can also be used to create datasets with improved structure and chemical diversity to benchmark different MLIP architectures.This work paves the way towards robust development and assessment of MLIPs across any compositional complexity.However, while the MPF.2021.2.8 dataset samples the first and middle ionic steps of the first relaxation and the last step of the second relaxation for calculations in the Materials Project, our initial, unsampled dataset includes all ionic steps from both the first and second relaxation calculations in Materials Project. 35In addition to the existing filters applied in MPF.2021.2.8, i.e., excluding any snapshots with a final energy per atom greater than 50 meV atom −1 or atom distance less than 0.5 Å, we have further fine-tuned the dataset by excluding ionic steps where: (1) electronic relaxation has not been reached and (2) at least one atom have no neighbors within the cutoff radius (5 Å).Last but not least, data of all structures with forces over 10 eV Å−1 were removed or substituted with better converged PES information.(see detailed discussion in below paragraphs of DFT calculations) This cleaned-up dataset contains a total of 1,315,097 structures, and henceforth will be known as the MPF.2021.2.8.All dataset.

Ti-H dataset
To generate a comprehensive configuration space for the Ti-H chemistry, 43,44 NpT MD simulations using the using the refitted M3GNet-DIRECT UP were carried out on 91 supercells of crystalline and grain boundary TiH n (0 ≤ n ≤ 2) structures at 300, 1000 and 3000 K, and

Structure encoders
The MatErials Graph Network (MEGNet) and Materials 3-body Graph Network (M3GNet) formation energy models trained on the 2019.4.1 Materials Project crystals data set were used as structure encoders.Both the MEGNet and M3GNet models have been described extensively in previous works, 6,36,37,47

Moment tensor potential for Ti-H
Two moment tensor potentials (MTPs) 4,25 were fitted with two training sets, i.e., AL set and DIRECT set, for the Ti-H system.The MTP cutoff radius r c and maximum level lev max were fixed at 5 Å and 20, respectively.In line with previous works, the weights of energies, forces and stresses were set at 1, 0.01 and 0, respectively.

DFT calculations
DFT calculations were performed using the Vienna ab initio simulation package (VASP 50,51 ).
The Perdew-Burke-Ernzerhof (PBE  35 The electronic convergence criterion (EDIFF) was set at 10 −5 eV, and the smallest allowed spacing between k points (KSPACING) was set at 0.35 Å−1 .All other settings were consistent with those used for static calculations in the Materials Project.The maximum number of electronic steps was set at 100.Over 83% were successfully converged.

Hypothetical O-and S-containing compounds
This dataset contains 506 and 291 O-and S-containing hypothetical materials, which were randomly selected from the ∼30 million hypothetical materials generated by Chen and Ong 6 .
One thousand hypothetical materials were initially selected for each group.DFT geometry optimizations were performed to those 2,000 structures using the settings for structure relaxations in the Materials Project.Only converged results were collected to be test sets for these two groups of hypothetical compounds.Subsequently, geometry optimizations were performed using M3GNet UPs with the same force convergence criterion of 0.1 eV Å−1 , and the energy above hull (E hull ) was calculated relative to the DFT-calculated structures in the Materials Project.

Ti-H system
Spin-polarized DFT calculations for TiH n were performed with an energy cutoff of 500 eV.
Three AIMD NVT simulations were performed for three TiH n supercells, including HCP

Figure 1 :
Figure 1: Workflow of DImensionality REduction -Clustering -sTratified (DIRECT) sampling.The standard steps in MLIP development are in black boxes, while the key conceptual improvements proposed in this work are highlighted in purple boxes.The methods in the brackets are those used in the present work, though they can be substituted with other similar approaches.

Figure 2 :
Figure 2: Comparison of DIRECT versus manual sampling (MS).(a) Explained variance of the first 30 principal components of the encoded features using the M3GNet and MEGNet formation energy models.Visualization of the coverage of the first two PCs of the M3GNetencoded structure features by (b) MS and (c) DIRECT sets.(d) Feature coverage scores for the first 14 PCs of the M3GNet-encoded structure features by the MS set and DIRECT set.

Figure 2
Figure 2 compares the coverage of feature space by manual sampling (MS) and DIRECT sampling approaches on the MPF.2021.2.8.All dataset.The MS set, which contains 185,877 structures, is constructed following the approach outlined by Chen and Ong 6 , which selects the first and middle ionic steps of the first relaxation and the last step of the second relaxation.Using DIRECT sampling with n = 20, 044 and k = 20, i.e., sampling at most 20

Figure
Figure 2a compares the explained variance vs the PCs of the encoded features using the M3GNet and MEGNet formation energy models.It can be seen that the M3GNet-encoded features are significantly more efficient, with a cumulative explained variance of 49% and 93% for the first 2 and 14 PCs, respectively.In contrast, the cumulative explained variance for the first 2 and 14 PCs for the MEGNet-encoded features are 25% and 57%, respectively.This indicates that the incorporation of the 3-body interactions in M3GNet leads to a more robust encoding of the diverse structures and chemistries in the MPF.2021.2.8.All dataset.From the plots of the first two PCs of the M3GNet-encoded features of the MS set (Figure2b) and DIRECT set (Figure2c), it can clearly be observed that the MS set undersamples structures located at the boundaries of the feature space, while the DIRECT set provides more comprehensive coverage.The coverage score for the first 14 PCs was calculated as n b i=1 c i /n b , where the entire range of values for each PC is divided into n b bins and c i equals 1 if data in the i th bin is successfully sampled, and 0 otherwise.The coverage score of the entire MPF.2021.2.8.All set is 1 by definition.Using n b = 50, 000, we find that the coverage scores of the DIRECT set across the first 14 PCs are all close to 1, with an average of 0.996, while the coverage scores of the MS set are all below 0.8 with an average of 0.642.Similar trends are observed for the 128-element M3GNet feature space (see FigureS1).

Figure 3 :
Figure 3: Distribution of (a) energies, (b) forces and (c) stresses in the DIRECT set, the MS set, and MPF.2021.2.8.All (referred as "All") are labeled by colors of yellow, green and purple, respectively.Mean absolute deviation (MAD) of each data set is annotated.

Figure
Figure3ato 3c compares the distribution of the energies, forces and stresses in the DIRECT and MS sets relative to the entire MPF.2021.2.8.All ("All") dataset.Despite having a comparable total number of structures, the DIRECT set provides a better coverage of the entire configuration space, with a much larger MAD in energies, forces and stresses compared to the MS set.This can be attributed to the better sampling of uncommon local environments in feature space by DIRECT sampling compared to manual sampling.

Figure 4 :
Figure 4: Performance of M3GNet universal potentials (UPs) trained using the DIRECT and MS training sets.Parity plots for (a) energies, (b) forces and (c) stresses for the M3GNet UP trained on the DIRECT set.The equivalent plots for the M3GNet UP trained on the MS set is shown in plots (d)-(f).The cumulative errors of (g) energies, (h) forces and (i) stresses in the two test sets by the two UPs are also plotted.

Figure 4
Figure 4 compares the performance of the M3GNet UPs trained on the DIRECT and MS sets (referred to as M3GNet-DIRECT and M3GNet-MS, respectively) relative to the ground truth DFT.The training protocols are largely similar to the ones used in the original M3GNet UP, with minor modifications as outlined in the Methods section.Two test sets were constructed

Figure 5 :
Figure 5: Cumulative absolute errors for energy above hull E hull prediction for (a) O-and (b) S-containing hypothetical materials by M3GNet-DIRECT and M3GNet-MS UPs.
, the two most computationally intensive steps in the development of MLIPs are the generation of the configuration space and DFT calculations of energies and forces.Often-used strategies to sample configuration space include highly expensive AIMD simulations and iterative efforts in active learning (AL) workflows.The advent of UPs such as M3GNet can provide the means to bypass ab initio methods and minimize or even eliminate AL iterations for the generation of a diverse configuration space.

Figure 6 :
Figure 6: Plot of the first two principal components of the feature space of TiH n (0 ≤ n ≤ 2) sampled by structures from three different sources, i.e., 75,000 AIMD snapshots for HCP Ti 36 H 2 , BCC Ti 36 H 36 and FCC Ti 36 H 72 at 1000 K, 2,063 configurations from MTP AL and 273,000 MD NpT snapshots from M3GNet-DIRECT.H 2 structures are excluded in this analysis to ensure better resolution for Ti-containing structures.

Figure 7 :
Figure 7: Energy and force errors of 400 AIMD test structures by MTPs fitted using M3GNet-DIRECT MD structures (MTP-DIRECT) and MTP AL structures (MTP-AL).
and interested readers are referred to those publications for details.After performing graph convolutions, the output graph features are concatenated (96 elements of atomic, bond and state vectors for MEGNet and 128 elements of atomic and state vectors for M3GNet) and passed through multi-layer perceptrons to generate the final output property.The final concatenated vectors from these models therefore encode the relevant structure/chemistry for the prediction of the formation energy.In this work, the concatenated 96-D vector of MEGNet and the concatenated 128-D vector of M3GNet were utilized as structure features.MLIP fittingM3GNet universal potentialTo refit the M3GNet UP, we have adopted the same settings as that used in the training of the original M3GNet UP, 6 including a 90:5:5 train:validation:test random split, a 1:1:0.1 weight ratio for energy (eV atom −1 ), force (eV Å−1 ) and stress (GPa) in a Huber loss function with δ = 0.01, an Adam optimizer with initial rate of 10 −3 and a cosine decay to 10 −5 in 100 epochs.One significant modification from the original M3GNet UP is that the model complexity is expanded by doubling the dimension of both atom embeddings and multi-layer perceptrons from 64 to 128.The performance of the M3GNet UPs trained with the original model complexity is provided in FigureS2for comparison.Further, the isolated atoms of all 89 elements in MPF.2021.2.8.All were added into M3GNet training set to improve the extrapolability of the final potential.All other structures with isolated atoms were removed from the training set.Finally, for faster convergence, training was stopped if the validation metric did not improve for 40 epochs, instead of 200 epochs.

Figure 1 :
Figure 1: Distribution of all the 128 elements in M3GNet structural features of structures in DIRECT, MS and the entire MPF.2021.2.8 dataset.

Figure 2 :
Figure 2: Performance of M3GNet UPs with the same model complexity as that of the pre-trained M3GNet-v0.1 by Chen and Ong 1 .(See details in Methods) Parity plots for (a) energies, (b) forces and (c) stresses for the M3GNet UP trained on the DIRECT set.The equivalent plots for the M3GNet UP trained on the MS set is shown in plots (d)-(f).The respective plots for the pre-trained M3GNet-v0.1 are also provided for comparison.
42C Ti 36 H 36 at 1000 K and FCC Ti 36 H 72 at 1000 and 3000 K) at 0.1 ps intervals.DFT static calculations were then performed to obtain energies and forces as test data.As shown Ea = 0.80 eV MTP-AL BCC Ti 648 H 648 , Ea = 0.16 eV MTP-AL HCP Ti 648 H 36 , Ea = 0.49 eV MTP-DIRECT FCC Ti 648 H 1296 , Ea = 0.91 eV MTP-DIRECT BCC Ti 648 H 648 , Ea = 0.14 eV MTP-DIRECT HCP Ti 648 H 36 , Ea = 0.42 eV Figure 9: MD simulations of H diffusion in HCP Ti 36 H 2 , BCC Ti 36 H 36 and FCC Ti 36 H 72 by AIMD, MTP-AL and MTP-DIRECT.(a)Meansquared displacement (MSD) of H atoms throughout 10-ps AIMD NVT and MTP MD/NVT simulations at 1000 K.An accurate prediction of hydrogen diffusion in titanium hydrides is challenging because of the nature of H atoms and the complex phase diagram of the systems, comprising HCP, BCC and FCC phases at different hydrogen atomic percentages.To further compare the two MTPs, MD simulations were carried out to investigate hydrogen diffusion in HCP Ti 36 H 2 , BCC Ti 36 H 36 and FCC Ti 36 H 72 .As reported in Figure 9a, both MTPs reproduce the trends of hydrogen MSD at 1000 K predicted by AIMD NVT simulations, i.e., the mean square displacement (MSD) of H is highest in the BCC phase and lowest in the FCC phase.The larger fluctuation of hydrogen MSD in the HCP Ti 36 H 2 phase in AIMD simulations can be attributed to the limited number of hydrogen atoms in the AIMD cell.This is largely ameliorated with the use of 3 × 3 × 2 supercells in MTP simulations, thanks to the computational efficiency of the MLIPs.AL are 0.42 and 0.49 eV in the HCP Ti 36 H 2 phase, 0.14 and 0.16 eV in the BCC Ti 36 H 36 phase, and 0.91 and 0.80 eV in FCC Ti 36 H 72 phase, respectively.These results are in excellent agreement with experimentally measured E a of 0.45 eV in HCP Ti from 873 to 1298 K, 40 0.15 eV in BCC Ti from 555 to 625 K, 41 and 0.92 eV in FCC Ti from 670 to 880 K.42Meanwhile, the experimentally measured hydrogen diffusivities are 3 × 10 −5 , 8 × 10 −5 and 6 × 10 −8 cm 2 /s for HCP, BCC and FCC at 1000 K, 800 K and 800 K, respectively, which are in line with the predictions from both MTPs.
25structure each from 954 clusters of the 274,000 M3GNet MD snapshots.DFT static calculations of 947 successfully converged and were used as the training set for MTP-DIRECT (see details in Methods).To compare the accuracy of energy and force predictions by "MTP-DIRECT" and "MTP-AL", 400 AIMD snapshots are selected as test structures from four 10-ps AIMD trajectories (HCP Ti 36 H 2 at 1000 K, Ti-H energies and forces.However, it should be noted that the size of the MTP-DIRECT training set (947) is half of that generated by the MTP AL process (2,077).Figure 8: The evolution of MTP MD stability by AL starting from two different initial training sets, i.e., 947 training structures for MTP-DIRECT (labeled as "case 1") and 92 starting structures for the 274 AL scenarios (labeled as "case 2").Evolution at the three different AL temperatures are plotted separately.To further evaluate the MD reliability of MTP-DIRECT, the same AL process used to train MTP-AL is applied to MTP-DIRECT.One MD run is considered reliable if no snapshots throughout the 10 ps are having extrapolation grade (γ) over 3, which is a fairly strict threshold.25Asshown in Figure8, all MD runs by MTP-DIRECT at 300 and 1000 K for the 91 Ti-containing Ti-H structures successfully completed at the 0 th AL iteration without emergence of any structures with γ over 3, indicating that the MTP-DIRECT is to construct the initial MTP-DIRECT reduces the number of AL iterations to reach MD reliability by 75% and the number of static DFT calculations by 50% in comparison to the simple AL scheme.MTP-AL FCC Ti648 H 1296 , (b) Arrhenius plot based on 1-ns MD/NpT simulations by the two MTPs from 300 to 1000 K with 100 K intervals.Diffusivities plotted only above temperatures where sufficient diffusion events are observed for a rigorous analysis.Activation energies (E a ) are also indicated in the legend.For MTP MDs, much larger 3 × 3 × 2 supercells of the AIMD cells were used.We then carried out 1-ns MD/NpT simulations with MTP-AL and MTP-DIRECT to study hydrogen diffusivity throughout a temperature range of 300-1000 K with a 100 K interval.As shown in the Arrhenius plot in Figure 9b, both MTPs exhibit good agreement on the predicted activation energy (E a ) and diffusivity.The simulated E a of hydrogen diffusion by MTP-DIRECT and MTP- 6,34setsMaterials Project MPF.2021.2.8.All DatasetThe Materials Project dataset used in this work is similar to the MPF.2021.2.8 dataset used by Chen and Ong 6 in the fitting of the M3GNet UP.6,34The MPF.2021.2.8 dataset comprises 187,687 ionic steps of 62,783 compounds in the MP database as of Feb 8, 2021.6,34 25,13,18AL was conducted using the protocol developed by Gubaev et al.25under exactly the same 274 MD scenarios explored by M3GNet-DIRECT UP. The cutof extrapolation grade for breaking the simulation and selection of structures was set at 3, i.e., γ break = γ select = 3.All 274 MD runs can reliably run for > 10 ps after 15 AL iterations.All training, evaluations and simulations with MTP were performed using MLIP, 4,25 LAMMPS 48 and the open-source Materials Machine Learning (maml) Python package. 49 52) generalized gradient approximation (GGA) functional was used for MPF.2021.2.8.All and Ti-H systems.
45,46 H 2 , BCC Ti 36 H 36 and FCC Ti 36 H 72 at 1000 K, and one AIMD NpT simulation was conducted at 3000 K for FCC Ti 36 H 72 .All AIMD simulations were conducted for 25,000 steps with a time step of 0.5 fs, in accordance with previous AIMD works for hydrogen diffusion.45,46Asingle Γ k point was used to sample the Brillouin zone.Self-consistent calculations were performed with an electronic relaxation convergence threshold of 10 −4 eV, while the density of the k grid in the reciprocal space was at least 100 / Å−3 .The maximum number of electronic steps was set at 100. ence, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.Part of this work was carried out under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory (LLNL) under contract No. DE-AC52-07NA27344.B.C.W.