Abstract
The controlled introduction of elastic strains is an appealing strategy for modulating the physical properties of semiconductor materials. With the recent discovery of large elastic deformation in nanoscale specimens as diverse as silicon and diamond, employing this strategy to improve device performance necessitates firstprinciples computations of the fundamental electronic band structure and target figuresofmerit, through the design of an optimal straining pathway. Such simulations, however, call for approaches that combine deep learning algorithms and physics of deformation with band structure calculations to customdesign electronic and optical properties. Motivated by this challenge, we present here details of a machine learning framework involving convolutional neural networks to represent the topology and curvature of band structures in kspace. These calculations enable us to identify ways in which the physical properties can be altered through “deep” elastic strain engineering up to a large fraction of the ideal strain. Algorithms capable of active learning and informed by the underlying physics were presented here for predicting the bandgap and the band structure. By training a surrogate model with ab initio computational data, our method can identify the most efficient strain energy pathway to realize physical property changes. The power of this method is further demonstrated with results from the prediction of strain states that influence the effective electron mass. We illustrate the applications of the method with specific results for diamonds, although the general deep learning technique presented here is potentially useful for optimizing the physical properties of a wide variety of semiconductor materials.
Introduction
Elastic strain engineering (ESE) has emerged as a promising tool to enhance the performance of functional materials, whereby characteristics of semiconductor materials, such as carrier mobility, can be modulated solely through the introduction of strain^{1}. With Moore’s law approaching its widely anticipated limit and with an everaccelerating search for improved device performance, tuning physical properties through controlled mechanical strains could offer a powerful pathway to advance the performance of semiconductors. Here, the magnitudes of elastic strains considered are significantly greater (deeper) than prior approaches adopted over the past several decades by the semiconductor industry, involving strained silicon with strain levels on the order of 1%^{2,3,4}.
To achieve “deep ESE” with significant performance enhancement in devices and to realize the optimal figureofmerit (FoM), the required elastic strain state would have to significantly exceed what strained silicon technology has thus far achieved through epitaxy. Indeed, recent experiments^{5,6,7} in freestanding geometry have revealed that several materials, at nanoscale dimensions typically used in semiconductor devices, are capable of withstanding large elastic strains at room temperature without inelastic shear relaxation, phase transformation, or fracture. For example, it has been demonstrated that even in the hardest material found in nature, diamond, the local tensile elastic strain can reach up to 10% in appropriately grown and oriented singlecrystal nanoneedles^{5,6} and microfabricated nanobridges^{7}. In nanowires of silicon, a reversible tensile strain of 15% has been realized in uniaxial tension experiments^{8}. These findings of ultralarge elastic deformation of semiconductor materials bookended by the ultrawide bandgap (5.6 eV) diamond and the more manufacturingfriendly and ubiquitous silicon (with a bandgap of 1.1 eV) have opened up potential opportunities to design their performance characteristics for applications such as power electronics, nanophotonics, and quantum information processing.
The complexity of modulating the fundamental physical properties of materials, such as the electronic band structure and bandgap through ESE, calls for identifying preferred and actionable strain states within the general sixdimensional (6D) strain space, represented by the elastic strain tensor ε ≡ (ε_{11}, ε_{22}, ε_{33}, ε_{23}, ε_{13}, ε_{12}). In order to achieve this through rigorous physicsinformed computational predictions, firstprinciples calculations based on density functional theory (DFT) are necessary to screen for relevant properties and characteristics to realize targeted FoM. Appropriate mechanical boundary conditions would then have to be designed to impose optimal strain states at the targeted spatial regions at the device level. To picture the complexity of this task, consider for example a crystal’s electronic band structure E_{n}(k; ε), which is a function of the 3dimensional wave vector \({\mathbf{k}}\) and the sixdimensional homogeneous strain tensor ε. Representing these band structures with a tabulation approach with 9 dependent variables would obviously require billions of firstprinciples calculations. Plain DFT calculations can also introduce systematic errors in the estimation of bandgap, and these errors have to be overcome with the extra computational cost incurred by employing more expensive calculation techniques such as manybody perturbation theory (socalled GW corrections^{9}). Additional costs with representing and storing of ESE effects would arise when the elastic strain gradient ▽ε is incorporated as a dependent variable (since ▽ε can be on the order of 10^{7}/m in nanoscale devices), which could influence properties such as nonlinear optical response induced by the flexoelectric effect^{10,11}.
The modulation of bandgap and carrier mobility (which characterizes electrical conductivity based on the speed with which an electron or a hole moves through a semiconductor under the influence of an electric field) has long been examined using linear response perturbation theory^{12}. While this approach is sufficiently powerful to guide the engineering of strained silicon under small strains, it loses validity for large deformation cases involving “deep” or nonlinear elastic strains so as to significantly enhance their electronic or optoelectronic performance characteristics^{13}. To address the challenge of calculating, analyzing, archiving, and visualizing material characteristics from highdimensional data, such as that related to E_{n}(k; ε), recent work has adopted a data fusion and transfer learning technique to integrate multiple data sources^{13}. Specifically, we developed a machine learning (ML) method^{13} that combined a dataset extracted from DFT calculations invoking GW correction with another dataset prepared with PerdewBurkeErnzerhof^{14} (PBE) exchange functional correlations. This method was adopted to demonstrate how large elastic strains can be used to make a semiconductor such as silicon and an electrical insulator such as diamond become metallike conductors with zero bandgap^{15}.
While our earlier deep ESE calculations are adequate for rapid data collection in a highly specialized model^{13,15}, they do not offer sufficient flexibility and accuracy for optimizing a broader consideration of physical characteristics such as the effective mass of electrons and holes, which is a secondderivative of E_{n}(k; ε) with respect to \({\mathbf{k}}\) and a strong sensitivity to noise. Therefore, it is appropriate at this stage of development of ML to incorporate a priori physicsinformed neural network (NN) architectures into the calculations in such a way that various performance characteristics and FoM estimates could be much better optimized through a judicious combination of DFT and deep learning. Further details of motivation for this development are articulated in the succeeding sections. These recent advances enable multiproperty optimization and Paretofront type tradeoff analysis.
To accomplish these goals, we present here a physicsinformed, convolutional neural network (CNN) technique that is more versatile, accurate, and efficient in its capability to facilitate autonomous deep learning of the electronic band structure of crystalline solids than the neural network architecture hitherto employed to address this class of problems. We propose more advanced algorithms and data representation schemes to provide markedly improved ML outcomes. The techniques described here enable detailed analysis of band structures in the general sixdimensional strain space to optimize select FoM of interest for specific performance targets. Moreover, our method achieves sufficient accuracy not only for the deep analysis of the bandgap and of the shape of the band structure, but also for capturing the curvature of the band and the effective mass.
Results
Band structure and physicsinformed ML
Our method seeks to develop accurate predictions of the band structure, by recourse to which many FoMs of technological interest could be directly estimated. Inspired by the wide adoption of deep learning in the field of computer vision^{16}, we draw an analogy between the color spectrum in a digital image and the band structure, regardless of whether it applies to electronic, phononic, or photonic band structure.
Using this analogy, we envision energy dispersions as stacked 3D “images”, with the reciprocal coordinates \({\mathbf{k}}\) ≡ (k_{1}, k_{2}, k_{3}) representing the “voxels” (i.e., 3D “pixels” of a digital image) and with \(E_n\) denoting the spectrum and intensity of colors (similar to the RGB or grayscale of an image) at each voxel for a particular 3D image, where \(n\) is the particular band among a total of N bands. Energy bands are piecewisesmooth functions in the reciprocal space, and the information within the energy dispersion of a specific band includes intraband correlations with respect to \({\mathbf{k}}\). An illustration of this pictorial view of the band structure can be found in Fig. 1a. Note that previous ML schemes based on simple feedforward NN treated an energy band as a flattened array of independent values^{13}—thereby neglecting to account for intraband correlation.
In prior work^{13}, different bands were analyzed separately by NN (Fig. 1a, b). Although this approach was sufficient to predict energy eigenvalues for a specific band or bandgap variations arising from strain, it could not capture interband physics accurately for the entire band structure because of limited data. The energy bands analyzed in the present method, however, are not “independent” of one another, as shown in Fig. 1, and they collectively describe the physical characteristics of the crystal. For example, consider a single electron in a periodic potential resulting from the interaction of the electron with the ions and other electrons. Solving the Schrödinger equation provides the solution for a series of Bloch waves, each of which has a predicted dispersive form. Through the firstprinciples method, all the quantized energy levels are determined. Specifically, the nth band is not calculated in isolation but is determined from the collective influence of its neighboring bands, including the adjacent (n − 1)th and (n + 1)th bands, as well as other nonadjacent bands. In other words, information from interband correlation influencing the nth band is included in the band structure of the crystal.
To reveal the internal structure of the band data in our model, we incorporate CNN into our ML scheme. CNN is known for its capability to extract hierarchical patterns in digital images and to assemble complex patterns by integrating information from smaller datasets^{17}. Utilizing the digital image analogy for the band structure, CNN is thus expected to serve as a useful tool for extracting useful patterns, or intraband/interband correlations.
Model description
The general setup of the proposed model is illustrated in Fig. 2a. It consists of a fully connected part followed by a CNN part. At the outset, the strain tensor ε is taken as the input and transformed into a feature vector through a series of fully connected layers, as depicted in Fig. 2a. This feature vector has a length of Nm^{3}, where m^{3} is the number of \({\mathbf{k}}\)points sampled in the Brillouin zone, and N is the number of bands we want to represent. Depending on the \({\mathbf{k}}\)mesh density, the feature vector can be adopted as a rich representation of the intraband information for a band structure. Currently, this part has four hidden layers with a structure of (6 → 128 → 256 → 512 → 512)_{n}, where 512 = m^{3}, for n = 1, 2,…, N separately, totaling ~1.1 million parameters. N is most often taken to be 4 in this work, sufficient for describing nearCBM/nearVBM properties of diamond for a particular strain state. Here, the band energy dispersion for the top valence band (VB, \(n\) = \(n_{{\mathrm{VB}}}\)), the lowest conduction band (CB, \(n\) = \(n_{{\mathrm{CB}}}\)), and their adjacent two bands (\(n\) = \(n_{{\mathrm{VB}}}\) − 1 and \(n\) = \(n_{{\mathrm{CB}}}\) + 1) could all be represented via 4 vectors each of which has a length of m^{3}_{.} Stacking them together, we build an m × m × m × 4 tensor representation of the band structure for any individual strain data, as illustrated in Fig. 2b. This process is similar to the decoding part of an autoencoder^{18} whereby a representation as close as possible to the band structure is generated. The resulting tensor is then fed into the next block of convolution.
The convolutional part consists of several blocks that update this tensor representation until the final output is determined. Note that the output tensor retains the same dimension of the band structure, i.e., m × m × m × N. This extraction process proceeds through many layers to deliver a band structure tensor with features that capture deep intraband and interband information. This output comprising the complete ML inference represents the band structure obtained by DFT calculations (Fig. 2a–b). In each convolutional block in the CNN part, the convolution is a twostep sequence. In the first step, a 3 × 3 × 3 × 1 kernel accounting for the intraband correlation (with periodical boundary conditions and symmetry) is used. In the second step, a 1 × 1 × 1 × 3 kernel accounting for the interband correlation is adopted. The convolution blocks can be stacked up at one’s discretion. The model yielding the lowest error in our study has three CNN blocks, totaling ~276,000 parameters. One can also use a onestep convolution (3 × 3 × 3 × 3) kernel instead of the aforementioned twostep convolution (3 × 3 × 3 × 1)→convolution (1 × 1 × 1 × 3) kernel, with more weights per block but better accuracy. Also, since 8 × 8 × 8 is still a relatively coarse \({\mathbf{k}}\)mesh when performing \({\mathrm{min}}_{\mathbf{k}}\), \({\mathrm{max}}_{\mathbf{k}}\) or \({\mathbf{k}}\)derivative operations, we use polynomial interpolation on top of the floatingpoint 8 × 8 × 8 representation, before carrying out such operations.
The power of this approach lies in the architecture of the proposed CNN model, which is tailored to the known physical structure and exploratory data analysis results (Fig. 2b and Supplementary Figs 1–2) in order to simplify training and to speed up inference. In particular, it takes advantage of:

i.
The timereversal symmetry, i.e., \(E_n(  {\mathbf{k}})\) = \(E_n({\mathbf{k}})\) which holds for the diamond crystal. Corresponding tensor representation preserves this property.

ii.
The correlation between the same \({\mathbf{k}}\)point of different bands (interband correlation). An interband convolution between the bands is applied at each \({\mathbf{k}}\)point so that bands influence one another.

iii.
The correlation between the energy eigenvalues associated with adjacent \({\mathbf{k}}\)points of the same band (intraband correlation), which ascertains that the band energy is a piecewisesmooth function of the \({\mathbf{k}}\)space coordinates. The intraband convolutions are carried out over several cycles so that the underlying physics of how energy eigenvalues from adjacent \({\mathbf{k}}\)points affecting one another are learned accurately.

iv.
Band structure calculations benefit from the periodic nature and symmetry of a crystal lattice. The band structure plot resulting from restricting \({\mathbf{k}}\) to the first Brillouin zone, also known as the reduced zone scheme, is typically used. This reciprocal lattice periodicity is represented in our model using a special technique for the periodic boundary condition that follows the reduced zone scheme.
Model training
The training of our model is achieved in three parts: preliminary training, data fusion, and active learning. In the first part, preliminary training was performed on the large dataset (~35,000 strain samples) of the computationally cheap DFTPBE calculations. After a prescribed level of accuracy (less than 0.5% relative error) was achieved, in the second part, we performed training on a much smaller set (~6000 strainsamples) of the accurate manybody GW calculation, starting from the NN parameters learned in the previous stage. This approach is known as knowledge transfer, as some of the knowledge gathered by NN from the lowfidelity PBE data is exploited to ease the training on the relatively more costly and reliable GW data. See Fig. 3a for a schematic of this process and the “Methods” section for computational details.
Active learning
Another integral part of our training is active learning, which entails a class of ML algorithms for the automatic assembly of the training set. Here the goal is to reduce the uncertainty compared to that generated in a random sampling of strains. It is often convenient to begin with subsets of the data that offer uncertain levels of reliability and accuracy. Various uncertainty estimates have been proposed^{19}. The particular choice of an uncertainty quantification procedure greatly influences performance in the active learning part.
There are three main routes to uncertainty estimation in NN: ensembling^{20}, variational inference, and dropoutbased inference. Straightforward ensembling requires a few separate models to be trained, but it imposes additional computational costs to both training and inference procedures. On the other hand, variational inference requires the usage of Bayesian neural networks (which have probability distributions instead of realvalued weights), and they also lead to costly training and inference steps. Dropout can be seen as an intermediate solution: it can be applied in a simple way to the existing NNs with fully connected layers and also has a theoretical justification in the Bayesian framework^{21}.
Here, we use the dropout uncertainty estimation enhanced with the Gaussian processes for stability^{22} to sample the most “uncertain” strain cases for further improvement of the model. Specifically, after the first round of the training on the GW data, we performed a calculation over a large set of random strains in 6D and chose a small amount of ~200 strain cases with the largest expected error as evaluated by this intermediate model (uncertainty measurement). These strain cases were added to the training set for the next round of training, as illustrated in Fig. 3a. Our study indicates that 5–10 cycles of the above active learning enable the trained CNN to reach the same level of accuracy with two to three times fewer data, thus considerably reducing the total amount of ab initio calculations without compromising the robustness of our ML model, see Fig. 3b. More details are provided in Supplementary Note 1 and Supplementary Fig. 3.
Model accuracy and performance
The ML framework outlined in Fig. 3a achieves high accuracy in a variety of tasks compared to existing ML methods. The CNN model outperforms previous simple feedforward NN architecture, as well as an ensemble of kernel ridge regression (KRR) based models for band structure prediction, achieving a relative error no greater than 0.5%, as shown in Fig. 3c. The predictions of properties related to the band structure, such as the bandgap \(E_{\mathrm{g}}\) (defined as the energy difference between the conduction band minimum (CBM) and valence band maximum (VBM) values), were treated in previous study^{13} as an isolated ML regression problem with a direct fit to the scalar \(E_{\mathrm{g}}\) and the estimation of CBM and VBM as two separate tasks with many repetitive ML runs. The present CNN model does not have this constraint. It is capable of simultaneously predicting intraband and interband property/values, including \(E_{\mathrm{g}}\), CBM and VBM, and interband electron excitation and photon emission energy at every \({\mathbf{k}}\)point (any vertical transition between any two bands), with a level of accuracy on par with or better than other models (Fig. 3c, Supplementary Fig. 4 and Supplementary Tables 12).
The current ML framework also achieves high reliability in locating the band edge \({\mathbf{k}}\)points. Here, the present machinery surpasses all the other models by a significant margin, as shown in Fig. 3d for the specific case of finding the CBM position for diamond. Locating CBM is a demanding classification problem due to a large number of classes: there are seven possibilities for diamond CBM location under 6D elastic strain. Predicting the entire band becomes inevitable for widebandgap materials such as diamond to achieve a high classification accuracy. Thus, the present CNN model captures the subtle difference between two CB \({\mathbf{k}}\)points.
The proposed framework is also shown to be sufficiently fast in terms of inference time to perform swift exploration and optimization in the 6D space of admissible strains. Though architecturally much more complex, the present model outcompetes KRRbased models by more than two orders of magnitude in computational speed, as shown in Fig. 3e. The CNN model has a time complexity comparable to simple NNs. In the next section, we discuss examples of ESE of a diamond crystal.
Discussion
We now consider the optimization of band structure shape and curvature and effective electron mass of diamond at certain \({\mathbf{k}}\)points. For this purpose, we explore the entire 6D strain space to identify energyefficient pathways to metalize diamond by turning it into an electrical conductor with a 0eV bandgap while preserving phonon stability. These results extend our deep learning analytical capabilities beyond those used previously to identify the conditions for the metallization of diamond using ESE^{13,15}.
Here we consider bandgap, arguably the most important band structure feature, as an example of the material property set as a target for deep ESE. The first objective is to identify the bandgap limits that can be reached by strained diamond within the phononstable region for ESE. We find the bandgap of the diamond can be increased to realize better performance in power electronics and optical applications. It can also be transformed to resemble the properties of any smallbandgap semiconductors and to exhibit a complete semiconductortometal transition to become a metallike electrical conductor at different strain states. The next objective is to determine the transitions between direct and indirect bandgap. Our study shows that the \({\mathrm{{\Gamma}}}\) point or the center of the Brillouin zone is associated with a direct bandgap, and it is achieved only when the proper shear strain components are imposed. Among all possible strains in the 6D strain space, the present model has identified a number of strain states that result in a direct bandgap in the diamond. These are illustrated in Fig. 4a. The present calculations explore the entire 6D strain state to identify optimal pathways for deep ESE within the full spectrum of theoretical possibilities. The power of ESE is demonstrated not only in tuning the bandgap value but also in facilitating the indirecttodirect bandgap transition that benefits photon emission and absorption.
In ESE, there would be many possible choices of ε to reach a certain value of direct or indirect bandgap. Applications of these strain states, ε’s, require different amounts of strain energy. The elastic strain energy density is defined as \(h({\boldsymbol{\varepsilon}}) \equiv \frac{{E({\boldsymbol{\varepsilon}})E^{0}}}{V^{0}},\) where \(V^0\) is the undeformed supercell volume, \(E^0\) and E(ε) are the total energy of the undeformed and deformed supercell, respectively. The resultant distribution of available bandgap values \(E_{\mathrm{g}}\) plotted against \(h\) represents the “density of states of bandgap”^{13} as shown in Fig. 4a. There exist many strain states with an elastic strain energy density between 35 to 95 meV/\({\AA}^{3}\) that can reach a direct 3 eV bandgap in the diamond. These strain states lie in the region bounded by the red dashed line. If one aims for the most energyefficient strain case to achieve the goal, one should choose the leftmost strains at a certain bandgap level. An upperbound and lowerbound function can also be defined to describe the limits of reachable bandgap in strained diamond, as indicated by the black dotted lines in Fig. 4a. A complete ranking of the common crystal directions with respect to their role in reducing the bandgap can be found in Supplementary Note 2 and Supplementary Fig. 5. Similarly, an increase in the bandgap can be explored by following the upperbound function (upper black dotted line in Fig. 4a). This line represents pure triaxial compression, i.e., \(\varepsilon _{11} = \varepsilon _{22} = \varepsilon _{33} < 0,\varepsilon _{23} = \varepsilon _{13} = \varepsilon _{12} = 0\).
Strain cases resulting in the same value of bandgap form an isosurface^{12} in the 6D space. For visualization purposes, we show only a 3D subspace by fixing three of the six strain components. Figure 4b–d illustrates the situation where only compressive and tensile normal strains are present (\(\varepsilon _{23} = \varepsilon _{13} = \varepsilon _{12} = 0\)). Key features of this bandgap isosurface in 3D include surfaces that are piecewise smooth (“carapaces”), ridgelines where two carapaces meet, and corners where three ridgelines meet. The multifaceted nature of the bandgap isosurface is attributed to the switch of the CBM \({\mathbf{k}}\)space position. As a consequence of strain tensor and crystal symmetries, this isosurface has the following features:

Three carapaces (the hard upper shell exoskeletons of turtles, tortoises, and crustaceans) labeled in red as \({\mathrm{{\Delta}}}_1,\;{\mathrm{{\Delta}}}_2,\) and \({\mathrm{{\Delta}}}_3\) correspond to strain cases with the same value of indirect bandgap but different CBM positions: (0, 0.375, 0.375), (0.375, 0, 0.375), and (0.375, 0.375, 0), respectively.

Three ridgelines labeled in green as \({\mathrm{r}}_1,\,{\mathrm{r}}_2,\) and \({\mathrm{r}}_3\) correspond to strain cases with relations \(\varepsilon _{22} = \varepsilon _{33}\), \(\varepsilon _{33} = \varepsilon _{11}\), and \(\varepsilon _{11} = \varepsilon _{22}\), respectively.

The corner \({\upmu}\) labeled in purple is the intersection of \({\mathrm{r}}_1,\,{\mathrm{r}}_2,\) and \({\mathrm{r}}_3\) and is the most “tensile” hydrostatic strain point on the bandgap isosurface, i.e., \(\varepsilon _{11} = \varepsilon _{22} = \varepsilon _{33}\).
The bandgap isosurface of strain cases where only shear strain components are present (\(\varepsilon _{11} = \varepsilon _{22} = \varepsilon _{33} = 0\)) is plotted in Fig. 4e. Besides three different indirect bandgap CBM positions at \({\mathrm{X}}_1:(0,0.5,0.5)\), \({\mathrm{X}}_2:(0.5,0,0.5)\), and \({\mathrm{X}}_3:(0.5,0.5,0)\), threeshearstrains can also give rise to direct bandgap in diamond where CBM is at the \({\Gamma}\) point. The change from the carapace labeled \({\mathrm{X}}_1\) to that labeled \({\Gamma}\) thus indicates an indirecttodirect bandgap transition in diamond (yellow arrow in Fig. 4e). The corresponding band structures for the indirect and direct bandgap are shown in Fig. 4f and g, respectively.
The effective mass of an electron is a key parameter that influences carrier mobility and electrical conductivity in semiconductor materials. If we denote the conduction band energy dispersion as \(E_{n_{{\mathrm{CB}}}}\)(\({\mathbf{k}}\)), then the corresponding effective mass tensor can be defined in terms of the Hessian matrix H(\(E_{n_{{\mathrm{CB}}}}\)(\({\mathbf{k}}\))) consisting of second partial derivatives with respect to \({\mathbf{k}}\). Based on the values drawn from our ML model, the electron effective mass \({\boldsymbol{m}}^ \ast\) tensor for an undeformed diamond at CBM is extracted by fitting the band structure:
where \(m_{\mathrm{e}}\) is the freeelectron mass. Given that m* is a secondorder derivative, it reveals not only the shape of an energy band but also its curvature, thereby providing more detailed information on band dispersion. The anisotropy at CBM is characterized by a longitudinal mass (\(m_{\mathrm{L}}^ \ast\) = 1.55\(m_{\mathrm{e}}\)) along with the corresponding equivalent (100) reciprocal space direction and two transverse masses (\(m_{\mathrm{T}}^ \ast\) = 0.31\(m_{\mathrm{e}}\)) in the plane perpendicular to the longitudinal direction. Our results for \(m_{\mathrm{L}}^ \ast\) and \(m_{\mathrm{T}}^ \ast\) are close to both the GW and experimental values (see Table 1), offering more evidence for the reliability of our electronic band structure representation. A plot that demonstrates the agreement between our model and GW calculations for effective mass components can be found in Supplementary Note 3 and Supplementary Fig. 6.
We also studied the 6D strain space to obtain the conductionrelated properties and the elastic strain energy density as functions of \(\varepsilon\). Here, we adopted our ML model to acquire the relation between the “conductivity effective mass” for the conduction electron \(m_{{\mathrm{cond}}}^\ast (\boldsymbol{\varepsilon})\) and \(h(\boldsymbol{\varepsilon})\), as shown in Fig. 5a. The values of scalar \(m_{{\mathrm{cond}}}^ \ast\) are obtained by averaging individual longitudinal and transverse effective masses, as in ref. ^{23}. The blue shading in Fig. 5a reveals the distribution of the available \(m_{{\mathrm{cond}}}^ \ast\), with darker shading implying more strains are able to reach a specific value of \(m_{{\mathrm{cond}}}^ \ast\) at a given \(h\).
The cumulative “density of states” of conductivity effective mass can be defined as
where \(\delta ( \cdot )\) and \({\mathrm{{\Theta}}}( \cdot )\) are the Dirac delta and unit step functions, respectively, \(d^{6}{\boldsymbol{\varepsilon}} \equiv d {\varepsilon}_{11}d {\varepsilon}_{22}d {\varepsilon}_{33}d {\varepsilon} _{23}d {\varepsilon}_{13}d {\varepsilon}_{12}\) in the 6D strain space. The density of states of conductivity effective mass (\(g\)) at \(h\prime\) can then be defined by the derivative of \(c(m_{{\mathrm{cond}}}^ \ast \prime ;\;h\prime )\) with respect to \(h\prime\):
The meaning of \(g\) is explained by considering in the \((h  \frac{{dh}}{2},h + \frac{{dh}}{2})\) interval all possible elastically strained states and the resultant distribution of \(m_{{\mathrm{cond}}}^ \ast\) arising from these states. Other plots of the density of states of individual effective mass tensor components are also available (see Supplementary Fig. 7). Moreover, the developed framework enables highquality predictions of the \({\boldsymbol{m}}^ \ast\) tensor components (as well as their averages) for every \({\mathbf{k}}\)point at various deformation cases.
Direct bandgap together with a small effective mass within a semiconductor material is a preferable combination in the design of radiation detectors and photovoltaic cells that enables the combination of high conductivity and light yield. Moreover, lower elastic strain energy density means less effort for reaching the same property design in ESE. The three objectives, however, generally cannot be minimized simultaneously, to give the handsdown best solution; instead, there exists a set of Paretoefficient solutions, which do not allow for any member of a triplet (\(E_{\mathrm{g}}\), \(m_{cond}^ \ast\), and \(h\)) to improve (i.e., decrease) without negatively affecting the other two members. The 3D Pareto front of minimized \(E_{\mathrm{g}}\), \(m_{{\mathrm{cond}}}^ \ast\), and \(h\), shown in Fig. 5b, indicates a compromise in simultaneously having a small bandgap and conductivity effective mass, where \(h\) could increase to more than 120 meV/\({\AA}^{3}\). It is not possible to achieve, for example, a nearzero bandgap and \(m_{{\mathrm{cond}}}^ \ast < 0.25m_{\mathrm{e}}\) without paying a considerable penalty in \(h\) by straining diamond, as indicated by the “infeasible region” in Fig. 5b. Also, it is likely to find higher \(h\) values that correspond to the same combination of (\(E_{\mathrm{g}}\), \(m_{{\mathrm{cond}}}^ \ast\)). In Fig. 5b, such elastic strain energy density values are associated with strain cases in the “feasible region”. In addition, Fig. 5c could serve as a blueprint to access all possible (\(E_{\mathrm{g}}\), \(m_{{\mathrm{cond}}}^ \ast\)) combinations achieved by straining diamond in order to find the smallest elastic strain energy density (\(h_{{\mathrm{min}}}\)) for each combination. Note that it includes more (\(E_{\mathrm{g}}\), \(m_{{\mathrm{cond}}}^ \ast\)) combinations and is not a projection of Fig. 5b onto the \(E_{\mathrm{g}}  m_{{\mathrm{cond}}}^ \ast\) plane.
In summary, by recognizing that the band dispersion is structured and highly correlated in continuous \({\mathbf{k}},\;\varepsilon ,\)and discrete n, the method presented in this work provides better approximation and less uncertainty in the estimation of key figures of interest in scientific and technological applications of semiconductors. This task is made possible through the implementation of physicsinformed neural network architecture, synergistic PBE + GW data sampling, and active learning. Specifically, the CNNbased network structure we developed can handle the tasks of the fast query of properties of any electronic materials, including bandgap, band edges, and the energy difference between electron athermal (phononfree) band transition, at accuracy on par with or better than their purposespecific counterparts. Direct utilization of this fitting scheme on diamond reveals the strain levels where indirecttodirect bandgap transition and insulatortometal transition take place.
To accomplish the task of band structure prediction, our network offers the capabilities of learning the complex intraband and interband correlation in a selfdirected manner while taking into account important physical characteristics, such as crystal periodicity and timereversal symmetry. For example, the application of our method on computing the extremely sensitive energy dispersionrelated properties such as the effective mass tensor demonstrates that the method can capture the secondorder details of band structure within a level of precision comparable to that of the underlying calculation method. Multiobjective Pareto optimizations are also carried out aided by this model. The general ML framework we propose here thus effectively alleviates the heavy dependence upon DFT calculation, which takes up about 99% of the model construction time in an otherwise typical firstprinciples materials design project without ML. At the same time, it provides an avenue for deploying physicsinformed deep learning. Finally, active learning technique coupled with data fusion provides smart and autonomous searching of the vast region of the 6D strain space for optimizing FoM.
Methods
The models used in this work are described in detail in the “Results” section. Additional content regarding firstprinciples calculation settings and dataset construction is included here.
Firstprinciples calculation
We used the projector augmented wave method (PAW)^{24} in our DFT simulations implemented in the Vienna Ab initio Simulation Package (VASP)^{25}, with the exchangecorrelation functional of PBE^{14}. In all calculations, the electronic wavefunctions were expanded in a plane wave basis set with an energy cutoff of 600 eV. An 8 \(\times\) 8 \(\times\) 8 MonkhorstPack^{26} \({\mathbf{k}}\)mesh was used to conduct the Brillouin zone integration. The maximum residual force allowed for atoms after structural relaxation is 0.0005 eV \({\AA}^{  1}\). Computations that invoke GW corrections were conducted on top of the above PBE settings. We chose to sample the strain cases in a range of \(\{  0.15 \le \varepsilon _{ii} \le 0.15,\;  0.1 \le \varepsilon _{ij} \le 0.1,(i,j = 1,\;2,\;3)\}\) that yields stable structures, ie. without imaginary phonon frequencies. To identify the phonon stability boundaries, we performed phonon calculations for densely sampled strain cases in the 6D space. These calculations were primarily carried out using the VASPPhonopy package^{27}. 2 × 2 × 2 supercells of 16 carbon atoms were created, and phonon calculations were conducted with a 3 × 3 × 3 \({\mathbf{q}}\)mesh. We also took full advantage of the known symmetries to further reduce the computations needed when collecting the strain data.
Dataset construction
In the data generation step of database construction, we took the LatinHypercubesampled^{28} strain points and adopted the above parameters in our ab initio calculations to acquire the bandgap, band structures, and related properties for every deformed structure. To validate our calculation, we compared it with accessible values obtained in experiments. Specifically, the undeformed diamond properties are widely available, and we validated our manybody G_{0}W_{0} calculation settings by matching our result at zerostrain with the experimental lattice constant, elastic properties, dielectric constant, and most importantly, bandgap and band structure of diamond. Since we have adopted phonon calculations to eliminate the cases where phase transitions (such as graphitization^{15,29}) could happen and focused our search on the elastic regime, the diamond structures which we conducted highthroughput computations are all of the sp^{3} hybridization types. Therefore, unlike the Materials Project database construction^{30} where separate DFT settings and experimental references had to be employed for different classes/phases of materials across a much larger chemical space, it would be enough for us to use the undeformed diamond as the reference to benchmark the calculations. In addition, for strain cases of greater interest (such as the near metallization and directbandgap strain cases), we went beyond the singleshot G_{0}W_{0} method and used partially selfconsistent GW_{0} calculation settings (allowing Green’s function iterations to acquire more accurate bandgap) that is known to obtain results better than calculations with hybridfunctional DFT^{31} and comparable with experimental measurement for many semiconductor materials^{32}.
Data availability
The authors declare that all data supporting the findings of this study are available within the paper and its Supplementary Information file. Further information is available upon reasonable request.
References
Li, J., Shan, Z. & Ma, E. Elastic strain engineering for unprecedented materials properties. MRS Bull. 39, 108–114 (2014).
Jain, S. C., Maes, H. E. & Van Overstraeten, R. Semiconductor strained layers. Curr. Opin. Solid State Mater. Sci. 2, 722–727 (1997).
Bedell, S. W., Khakifirooz, A. & Sadana, D. K. Strain scaling for CMOS. MRS Bull. 39, 131–137 (2014).
Sun, Y., Thompson, S. E. & Nishida, T. Physics of strain effects in semiconductors and metaloxidesemiconductor fieldeffect transistors. J. Appl. Phys. 101, 104503 (2007).
Banerjee, A. et al. Ultralarge elastic deformation of nanoscale diamond. Science 360, 300–302 (2018).
Nie, A. et al. Approaching diamond’s theoretical elasticity and strength limits. Nat. Commun. 10, 1–7 (2019).
Dang, C. et al. Achieving large uniform tensile elasticity in microfabricated diamond. Science 371, 76–78 (2021).
Zhang, H. et al. Approaching the ideal elastic strain limit in silicon nanowires. Sci. Adv. 2, e1501382 (2016).
Aryasetiawan, F. & Gunnarsson, O. The GW method. Rep. Prog. Phys. 61, 237–312 (1998).
Lu, H. et al. Mechanical writing of ferroelectric polarization. Science 336, 59–61 (2012).
Wang, Z. et al. Nonlinear behavior of flexoelectricity. Appl. Phys. Lett. 115, 252905 (2019).
Bardeen, J. & Shockley, W. Deformation potentials and mobilities in nonpolar crystals. Phys. Rev. 80, 72–80 (1950).
Shi, Z. et al. Deep elastic strain engineering of bandgap through machine learning. PNAS 116, 4117–4122 (2019).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Shi, Z. et al. Metallization of diamond. PNAS 117, 24634–24639 (2020).
Ciregan, D., Meier, U. & Schmidhuber, J. Multicolumn deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3642–3649 (IEEE Computer Society, Eric Mortensen, 2012).
Sharma, A., Vans, E., Shigemizu, D., Boroevich, K. A. & Tsunoda, T. DeepInsight: a methodology to transform a nonimage data to an image for convolution neural network architecture. Sci. Rep. 9, 11399 (2019).
Kingma, D. P. & Welling, M. An Introduction to Variational Autoencoders. (Now Publishers, 2019).
Shapeev, A. et al. Active learning and uncertainty estimation. Machine Learning Meets Quantum Physics, 309329. (Springer International Publishing, 2020).
Cohn, D. A., Ghahramani, Z. & Jordan, M. I. Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996).
Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning. Vol. 48, 1050–1059 (eds Balcan, M. F. & Weinberger, K. Q.) (JMLR.org, 2016).
Tsymbalov, E., Makarychev, S., Shapeev, A. & Panov, M. Deeper Connections between Neural Networks and Gaussian Processes Speedup Active Learning. In Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence. 3599–3605. http://ijcai.orgijcai.org (Sarit Kraus, 2019).
Zeghbroeck, B. J. V. Principles of Semiconductor Devices. (Bart Van Zeghbroeck, 2011).
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953–17979 (1994).
Kresse, G. & Furthmüller, J. Efficiency of abinitio total energy calculations for metals and semiconductors using a planewave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Monkhorst, H. J. & Pack, J. D. Special points for Brillouinzone integrations. Phys. Rev. B 13, 5188–5192 (1976).
Togo, A. & Tanaka, I. First principles phonon calculations in materials science. Scr. Mater. 108, 1–5 (2015).
McKay, M. D., Beckman, R. J. & Conover, W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979).
Regan, B. et al. Plastic deformation of singlecrystal diamond nanopillars. Adv. Mater. 32, 1906458 (2020).
Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Kaczkowski, J. Electronic structure of some Wurtzite semiconductors: hybrid functionals vs. ab initio many body calculations. Acta Phys. Pol. A 121, 1142–1144 (2012).
Hybertsen, M. S. & Louie, S. G. Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B 34, 5390–5413 (1986).
Nava, F., Canali, C., Jacoboni, C., Reggiani, L. & Kozlov, S. F. Electron effective masses and lattice scattering in natural diamond. Solid State Commun. 33, 475–477 (1980).
Willatzen, M., Cardona, M. & Christensen, N. E. Linear muffintinorbital and k⋅p calculations of effective masses and band structure of semiconducting diamond. Phys. Rev. B 50, 18054–18059 (1994).
Löfås, H., Grigoriev, A., Isberg, J. & Ahuja, R. Effective masses and electronic structure of diamond including electron correlation effects in first principles calculations using the GWapproximation. AIP Adv. 1, 032139 (2011).
Acknowledgements
The computations involved in this work were conducted on the computer cluster at Skolkovo Institute of Science and Technology (Skoltech) CEST Multiscale Molecular Modelling group and Massachusetts Institute of Technology (MIT) Nuclear Science Engineering department. E.T., Z.S., A.S., and J.L. acknowledge support by the SkoltechMIT Next Generation Program 20167/NGP. E.T. and A.S. acknowledge support by the Center for Integrated Nanotechnologies, an Office of Science User Facility operated for the U.S. Department of Energy Office of Science by Los Alamos National Laboratory (Contract 89233218CNA000001) and Sandia National Laboratories (Contract DENA0003525). M.D. acknowledges support from MIT JClinic for Machine Learning and Health. S.S. acknowledges support from Nanyang Technological University through the Distinguished University Professorship.
Author information
Authors and Affiliations
Contributions
Z.S. and E.T. contributed equally to this work. E.T. and Z.S. performed the research, including firstprinciples calculations, machine learning model development, and data analysis. E.T., Z.S., M.D., S.S., J.L., and A.S. participated in writing and revising the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tsymbalov, E., Shi, Z., Dao, M. et al. Machine learning for deep elastic strain engineering of semiconductor electronic band structure and effective mass. npj Comput Mater 7, 76 (2021). https://doi.org/10.1038/s41524021005380
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524021005380