Computational synthesis of substrates by crystal cleavage

The discovery of substrate materials has been dominated by trial and error, opening the opportunity for a systematic search. We generate bonding networks for materials from the Materials Project and systematically break up to three bonds in the networks for three-dimensional crystals. Successful cleavage reduces the bonding network to two periodic dimensions. We identify 4693 symmetrically unique cleavage surfaces across 2133 bulk crystals, 4626 of which have a maximum Miller index of one. We characterize the likelihood of cleavage by creating monolayers of these surfaces and calculating their thermodynamic stability using density functional theory to discover 3991 potential substrates. Following, we identify distinct trends in the work of cleavage and relate them to bonding in the three-dimensional precursor. We illustrate the potential impact of the substrate database by identifying several improved epitaxial substrates for the transparent conductor BaSnO3. The open-source databases of predicted and commercial substrates are available at MaterialsWeb.org.


INTRODUCTION
Single-crystal substrates facilitate the epitaxial growth of crystalline thin films based on their surface similarity, in terms of lattice parameters and symmetry. Many single-crystal substrate materials are commercially available with different facet orientations. However, while exfoliated 2D materials such as graphene are naturally smooth, cleaving a crystal does not guarantee an atomically smooth surface. Energetic instabilities of cleaved facets can result in the formation of various defects that limit the quality of a substrate for synthesis efforts. Identifying a wider range of substrates with low surface energies would enable the growth of high-quality materials by minimizing the presence of undesirable defects. For example, the cubic perovskite BaSnO 3 has diminished electronic properties due to the lattice mismatch of its (100) surface and the substrates suitable for its growth [1][2][3] . In addition to lattice mismatch, one must consider the reactivity between a substrate and the thin film being grown, for both the precursor chemical reactions and the final thin film composition, which will further reduce the number of available options.
The discovery of two-dimensional (2D) materials present similar challenges to substrates, most notably that both require materials with low surface energies. Computational efforts to identify 2D material have significantly expanded the list of potential monolayers and helped guide experimental synthesis [4][5][6][7][8][9][10][11] . One discovery technique is data mining, which searches bulk materials databases for yet unidentified monolayers [6][7][8][9][10][11] . A recent effort in this direction identified and characterized the bonding network of a crystal structure using the topological scaling algorithm (TSA) 7 . This algorithm identifies bonding clusters within a finite number of unit cells, then uses the scaling of this network size as a function of supercell size to define dimensionality. One of the most significant contributions of this approach is that one does not require one to define a facet to search along a priori. The algorithm itself will identify a monolayer with a surface parallel to any facet cut, and thus renders the issue of orientation mute. This resulted in the discovery of over 600 low-energy 2D materials 7 and the creation of the MaterialsWeb.org 2D material database.
Previous computational efforts have been made to identify potential substrate materials, e.g., Ding et al. 12 searched the Materials Project database for a substrate to stabilize a metastable polymorph of VO 2 . Using lattice mismatch and induced thin-film strain as criteria, they identify several candidates. That search considered a set of low-index facets but did not account for the surface energy of the facets. This illustrates the need for an algorithm to identify likely substrates and a database of low-energy work of cleavage facets, which is the scope of this work.
In this work, we present an approach to identify substrates using a high-throughput approach, which utilizes the computational cleavage of bulk crystals. This effort is motivated by our recent 2D structure search using the genetic algorithm software GASP 13,14 for the Ga 2 O 3 system. In that search, we identified a monolayer structure that can be cleaved from the bulk crystal and was already experimentally synthesized 15,16 , shown in Fig. 1. Thus, we developed data mining techniques to discover planes of cleavage in fully periodic crystals. Rather than using the TSA to identify van der Waals gaps, we use it first to identify conventionally bonded solids; then, we systematically break bonds in the crystal to create a low-dimensional structure. When the breaking of bonds generates a surface (i.e., gives the material a 2D structural motif), we extract a unit cell thick monolayer and calculate the work of cleavage (i.e., the energy cost required to cleave the crystal and form two surfaces) using density functional theory (DFT). This approach identifies nearly 4,000 potential substrates with surface energies comparable to experimentally used substrates.

Candidate materials
For our study, we select all materials in the MaterialsProject database 17 with five or fewer atoms in the primitive unit cell and within 50 meV atom −1 of the convex hull, the latter to promote thermodynamic stability of the final surface. Next, the topological scaling algorithm (TSA) identifies conventionally networked structures from this subset and creates a list of all bonds between neighboring atoms in these crystals. The bonding identification utilizes the empirical atomic radii of the elements, as provided in the pymatgen software package 18 , which we increase by 10%.

Bond cleavage
We implement two approaches for cleaving bonds in a crystal structure, which we illustrate in Fig. 2. The first approach breaks all periodic instances of a bond between two atoms in the supercell and is denoted as the "periodic bonds" approach. In this case, the three A-B bonds shown in Fig. 2(a) are treated separately. The second approach breaks the bonds between all periodic instances of the atoms in the supercell and is referred to as the "periodic atom" approach, illustrated in Fig. 2 We systematically break symmetrically unique bonds in the primitive cell up to a maximum number of bonds given by where N atoms is the number of atoms in the primitive cell, and α is a scalable parameter based on the desired maximum number of bonds broken. The exponent of 2/3 appropriately scales the number of bonds per plane with increasing cell size. The choice of α = 1 in this work results in one to three cleaved bonds in primitive cells of one to five atoms. For the periodic bond approach, N bonds scales with the square of the supercell size. For the periodic atom approach, N bonds scales at least with the square of the supercell size, with the potential to break a significantly larger number of bonds than the periodic bond approach. These two approaches are complementary, as illustrated in Fig. 2. For example, applying the periodic atom approach to the structure in Fig. 2(a) or the periodic bond approach to the structure in Fig. 2(b) would both fail to cleave a surface with α = 1 in Eq. (1). As the unit cell size increases, the periodic bond and periodic atom approaches become equivalent. With a high enough α value, the periodic bond approach will also identify the cleaved surfaces of the periodic atom approach. However, this would be significantly more computationally expensive due to the increase in possible combinations of broken bonds, which need to be considered.

Topology of resulting structure
To determine if a cleavage surface has been generated, the TSA is run on the three-dimensional primitive cell with N bonds or fewer bonds broken (using either approach). If the TSA identifies a twodimensional structural motif with the same stoichiometry as the overall cell, we have created a cleavable surface. This surface is isolated as a unit cell thick monolayer and oriented such that the a ! and b ! lattice vectors span the 2D lattice of the monolayer structure and the c ! lattice parameter is chosen perpendicular to the ð a ! ; b ! Þ plane. Due to the crystal symmetry, our algorithm can identify multiple instances of some surfaces. We remove these duplicates and identify the unique surfaces extracted from each crystal using the pymatgen structure matching algorithm 17 .

Predicted substrates
From a starting set of 120,612 materials in the Materials Project database 17 , 9,089 crystals meet the criteria of exhibiting (i) a formation energy within 50 meV atom −1 of the thermodynamic hull (72,316), (ii) a primitive cell of five or fewer atoms (9,980), (iii) all lattice angles greater than 16 ∘ and less than 164 ∘ (9,973), and (iv) a bonding network of three-dimensional topology. Criteria (iii) is introduced as a result of the exfoliation algorithm, which is found to return incorrect cleaved surfaces if the lattice has an extreme lattice angle. Though applying the algorithm to the conventional cell of these crystals resolves the issue, this results in a unit cell with greater than five atoms and thus is not considered. Applying the periodic bond and periodic atom cleavage approaches to the 9,089 materials generates 1,925 and 4,006 symmetrically unique surfaces, respectively. We apply the pymatgen structure tool to the combined list of 5,931 surfaces to identify a total of 4,693 unique cleaved surfaces across 2,133 bulk crystals. respectively. We apply the pymatgen structure tool to the combined list of 5,845 surfaces to identify a total of 4,693 unique cleaved surfaces across 2133 bulk crystals. It is worth noting here that Ga 2 O 3 is not identified as cleavable by this search, as (i) the number of atoms in the primitive cell of the crystal is greater than five and (ii) the TSA identifies the crystal as layered when using our choice of 1.1 times the atomic radii of Ga and O, rather than as a fully networked structure.
To determine the stability of the surfaces, we perform DFT calculations with VASP 19,20 . We extract a single unit cell thick slab for each of the identified 4,693 unique cleavage surfaces and perform geometric optimization of atomic positions within the (a) Periodic Bond Fig. 2 Representation of the bond cleaving algorithm. a The periodic bond approach cleaves all periodic images of a bond. b The periodic atom approach cleaves the bonds between all periodic images of the atoms. The periodic atom approach requires at least three atoms in the primitive cell to generate a cleaved facet.  21 . Therefore, care should be taken when considering these materials in the MaterialsWeb.org substrate database. We use the work of cleavage of a monolayer, E cleave , to measures the energy required to cleave a unit area of the bulk precursor material and the surface energy, E surf , to describe the thermodynamic stability of a material's facet: and where E sub,as-cut and E sub,opt are the energies of the as-cut substrate immediately after cleaving and of the optimized substrate after relaxing the atomic positions; E bulk is the energy of the bulk precursor, N bulk and N sub represent the number of atoms in the bulk and substrate, respectively, and A sub denotes the substrate area. The factor of two difference is due to the work of cleavage being a measure of the energy needed to create two surfaces, while the surface energy is the thermodynamic stability of a single surface.
To validate that a monolayer accurately approximates the work of cleavage for cleaving a crystal, we calculate the change in work of cleavage with slab thickness for a subset of 21 (001) surfaces, from one to four unit cells. We find changes in the work of cleavage of less than 3 meV Å −2 in these systems, confirming that one unit cell thick slabs sufficiently describe the work of cleavage for a cleaved surface in this high-throughput effort. To verify that the work of cleavage indicates surface stability, we compare the calculated work of cleavage for 4,614 cleaved, free-standing slabs to their partially-optimized surface energy in Fig. 3. We observe that 2,557 materials display surface energies within 10% of half their work of cleavage and 1,615 within 5%.
We note that there is the possibility of surface reconstructions for some of the predicted substrates. However, due to the high computational cost, we do not perform a search for possible reconstructions. The thermodynamic driving force for surface reconstructions is surface energy. Hence, the probability for reconstructions decreases for lower energy surfaces. However, before pursuing synthesis of a substrate in this work, we recommend determining at least the dynamic stability of a surface structure using phonon calculations and, if possible, searching for reconstructions using, e.g., genetic algorithm searches 14 . Note that as this is a global optimization problem 22 , there is no guarantee that any reconstruction, let alone the most stable reconstruction, will be identified with computational techniques. Reconstructions will also be significantly altered by the adsorption of species in the first stages of the film growth on the substrate, making this problem more closely related to the specific film-substrate interface.

DISCUSSION
Our search identifies three surfaces currently used as substrates: hexagonal CdS, ZnO, and AlN. These crystals share the same structure and are all commercially available as (0001) substrates. In our search, we identify four substrates for each material: one (110), one (101), and two (001) facets. These facets are equivalent to the Miller-Bravais convention of (1100), (1011), and (0001). The most stable facet for these crystals is (110), with a work of cleavage of 56, 123, and 275 meV Å −2 for CdS, ZnO, and AlN, respectively. The most stable (001) facet of each crystal has a work of cleavage of 92, 196, and 407 meV Å −2 , respectively. Figure 3 highlights 3,991 surfaces (2,307 f-valence free and 2,183 with an entry in the Inorganic Crystal Structure Database (ICSD) entry) with a work of cleavage below that of (001) AlN, and 935 (718 f-valence free and 418 with an ICSD entry) surfaces below that of (001) CdS.
We consider any slab with a work of cleavage less than (001) AlN to be a suitable substrate and included in our substrate database. Some facet cuts could have different terminations for the same cleaved unit. For example, the cleaved (001) AlN can has both an aluminum-terminated and nitrogen-terminated facet. For these cases, the reported work of cleavage and surface energy are an average of these terminations. In addition to the energy benchmarks, we identify the conventional cell hkl index of the facets we cut. Of the 4,693 facets, 4,626 have Miller indices no greater than one, showing that the vast majority of planes we cleave are low-index facets. Of the remaining facets, 63 have a maximum Miller index of two and four have a maximum Miller index of three. While our choice of primitive cells biases the indices found, these results demonstrate how our approach is agnostic to the orientation of the facet cuts and the potential for future searches to expand this substrate database. Figure 4 shows the trend between the number of cleaved bonds and the work of cleavage across the 4614 converged cleaved surfaces. The joint distribution in Fig. 4(a) indicates that our cleavage criterion of Eq. (1) results in both a low density of cleaved bonds and a moderate spread of work of cleavage. Furthermore, the distribution indicates two clustering trends in the plot, which correspond to different average bond energies. We attribute these trends to different types of chemical bonds being broken. Metallic systems typically display high coordination numbers and somewhat lower bond energies. Covalent and ionic bonds share localized electrons, which results in lower coordination numbers and higher energies per bond.
To test this hypothesis, Fig. 4(b) and (c) show the work of cleavage and bond density distribution for monolayers derived from metallic and insulating precursors, respectively, based on the precursor bandgap reported in the MaterialsProject database 17 . We observe that the two clusters in the distribution indeed correspond predominantly to metallic and covalent/ionic bonding. Metallic bonds typically occur in materials with higher coordination numbers and have lower average energy than covalent or Work of Cleavage (meV Å--2 ) Surface Energy (meV Å -2 ) Fig. 3 The work of cleavage and the surface energy after ten relaxation steps of the cleaved surfaces. Red indicates a higher density of points. The solid, dashed, and dotted black lines represent the work of cleavage of the (0001) CdS, ZnO, and AlN substrates, respectively. The grey region indicates materials have a work of cleavage greater than common substrates. ionic bonds, resulting in a broad spread of bond densities in Fig. 4 (b). In contrast, covalent/ionic bonds typically correspond to lower coordination numbers with higher energies for each bond, which is reflected in the distribution in Fig. 4(c). The average bond energies of the metallic and insulating precursors are 1.3 and 1.8 ev bond −1 , respectively, further demonstrating the difference of bonding in these systems. As bonds exist across a continuous spectrum of metallic, covalent, and ionic character, the average bond energy is not a precise description of the bonding in these systems. However, the observed bond energies in the metallic and insulating precursor materials reveal an overall consistent trend.
To determine the potential of our data mining approach to expand the set of known substrates, we characterize the cleavage surfaces' symmetry and lattice parameters. Figure 5 illustrates the lattice parameter distribution of the hexagonal, square, and tetragonal substrates identified in this work which are free of f-valence elements, as well as the electronic properties of their bulk precursors. The broad range of electronic behavior and lattice parameters for each symmetry indicates that these cleaved crystals could provide suitable substrates for a variety of thin film systems.
We follow by creating a database of commercially available substrates totaling 190 substrates and include the data in Fig. 5 using crystal structures and band gaps sourced from the Materials Project database. Comparing the database of predicted substrates to one of commercially available substrates shows two significant advancements. First, we greatly expand the list of substrates exhibiting band gaps within the visible spectrum and beyond, creating more opportunities for optical excitation of thin films from beneath the substrate rather than from above. Second, we identify substrates with a broader range of lattice parameters, which can help synthesis efforts. For example, currently, 23 commercially available hexagonal substrates exhibit lattice parameters between 2.5 and 5.0 Å. The computational database expands this list to 984 hexagonal substrates within the same range. This increase provides more opportunity to identify a substrate that both epitaxially matches the growth crystal and is chemically compatible.
To demonstrate the application of this substrate database, we epitaxially match cubic perovskite (100) BaSnO 3 , using the crystal structure and stiffness tensor provided by MaterialsProject 17,23 to substrates with a work of cleavage below that of (0001) AlN. We use the pymatgen 18 lattice matching algorithm to epitaxially match the perovskite to the layers extracted in our search. We identify 42 cleavage surfaces as potential substrates when using the screening criteria of (i) a work of cleavage less than that of (0001) AlN, (ii) an induced strain energy below 2 meV atom −1 , (iii) no f-valence species, and (iv) epitaxial matches to a single unit cell of BaSnO 3 . Substrate matching also requires considering the chemical compatibility between the substrate and the thin film, and thus we focus further discussion on substrates with chemically inert surfaces. We highlight here three potential substrates: (001) Rb 2 NiO 2 24 , (001) NiO 25 , and (001) CaSe 26 . The work of cleavage for each substrate is small, being 19, 72, and 56 meV Å −2 , respectively. The resulting epitaxial strain of BaSnO 3 for Rb 2 NiO 2 and NiO are also small, with only +0.3% and +0.4%, which correspond to strain energies of 0.2 and 0.4 meV atom −1 , respectively. These are an order of magnitude smaller than currently used substrates such as (001) SrTiO 3 (−5.4%) and (001) MgO (+2.2%) indicating the opportunity for defect-free growth of BaSnO 3 on these substrates. The strain energy when using CaSe is larger at 1.3 meV atom −1 , though still with a low lattice mismatch of +0.7%. All three precursor materials-Rb 2 NiO 2 , NiO, and CaSe-have been experimentally synthesized [24][25][26] , providing increased opportunities for the growth of BaSnO 3 . In addition, the former two being oxides indicates that the surfaces will be fairly inert and unlikely to form strong chemical bonds with BaSnO 3 .
To facilitate the use of these substrates for further studies and future synthesis efforts of epitaxial single-crystal thin films, we provide the substrate structures and the data on their stability under an open-source license at MaterialsWeb.org. Furthermore, we make the software for cleaving crystal structures available as part of the open-source MPInterfaces package 27,28 .
In conclusion, we developed a data mining approach that systematically breaks bonds in three-dimensional crystals to identify cleavage planes for substrate synthesis. We identify 4,693 symmetrically unique cleavage surfaces across 2,133 periodic crystals and determine their structure and stability. We show that 3,991 surfaces display a work of cleavage comparable to that of the known substrate material (0001) AlN. These cleavage surfaces show a broad distribution of electronic properties and lattice parameters. We illustrate their utility by identifying 42 substrates for BaSnO 3 with epitaxial matches that are an order of magnitude better than currently used substrates, explicitly discussing two oxide and one chalcogenide substrate. Though we limited our search to small primitive cells, the low surface energy of Ga 2 O 3 indicates that there are many more low-energy substrates which can be cleaved from periodic crystals.

Characterization of cleaved materials
To calculate the stability of the cleaved surfaces, we perform density functional theory (DFT) calculations with the projector-augmented wave (PAW) method as implemented in the VASP package 19,20 . The choice of PAW potentials follows the recommendation of pymatgen 17 . We employ the Perdew-Burke-Ernzerhof (PBE) 29 approximation for the exchangecorrelation functionals. To obtain convergence of the energy to 1 meV atom −1 , we use a Γ-centered k-point mesh with a density of 60 k-points per Å −1 and a cutoff energy for the plane-wave basis set of 600 eV. For the cleaved slabs, we employ a vacuum spacing of at least 14 Å and reduce the number of k-points in the out-of-plane direction to one. The Brillouin zone integration uses Gaussian smearing with a width of 0.03 eV. The DFT calculations are performed spin-polarized with an initial ferromagnetic configuration. Transition metal and f-valence atoms are initialized with a magnetic moment of 6 μ Bohr and all others with a moment of 0.5 μ Bohr . We perform single-point calculations on the bulk precursors sourced from the Materials Project database to ensure accurate energy and consistency of calculation settings across all systems. We allow a maximum of ten iterations (ionic steps) during the optimization of the atomic positions in the slabs, and as not all surfaces converged within that window, we refer to this energy as "partially-relaxed." For the systems that reached ten iterations, the average and median surface energy gained are 1.6 and 0.2 meV Å −2 , respectively, from the relaxations. We keep lattice constants of the slabs fixed at the cleaved values for these calculations.

Workflow
We use the software packages pymatgen 18 and MPInterfaces 27 to prepare the input files, organize the results for the cleavage algorithm, and analyze the results of the DFT calculations. We use the pymatgen structure matching algorithm with an atomic position tolerance of 10 −4 and not permitting any primitive cell reduction, lattice scaling, or supercell transformations 17 . Decreasing the tolerance to 10 −5 only marginally increases the number of surfaces by 27, indicating that the choice of 10 −4 provides high confidence in the identified substrates being symmetrically distinct. Any surfaces which diverged in energy or displayed a final surface energy greater than half the work of cleavage are excluded from the analysis.

DATA AVAILABILITY
The databases of predicted and experimental substrates identified in this work is freely available at https://materialsweb.org and at https://www.materialscloud.org 30 .