Identifying candidate hosts for quantum defects via data mining

Atom-like defects in solid-state hosts are promising candidates for the development of quantum information systems, but despite their importance, the host substrate/defect combinations currently under study have almost exclusively been found serendipitously. Here we systematically evaluate the suitability of host materials by applying a combined four-stage data mining and manual screening process to all entries in the Materials Project database, with literature-based experimental confirmation of band gap values. We identify 580 viable host substrates for quantum defect introduction and use in quantum information systems. While this constitutes a significant increase in the number of known and potentially viable material systems, it nonetheless represents a significant (99.54%) reduction from the total number of known inorganic phases, and the application of additional selection criteria for specific applications will reduce their number even further. The screening principles outlined may easily be applied to previously unrealized phases and other technologically important materials systems.


Introduction
In recent years, significant effort has been devoted to the realization of functional systems for quantum information science (QIS). QIS devices employ the manipulation of quantum states to store, process, and transmit information, potentially enabling the rapid solution of problems long thought impossible or impractical to address through classical methods. One of the most promising platforms for the development of efficient quantum information systems, particularly for application in quantum networks, 1 is atomic or atom-like defects in solid-state hosts. 2 The quantum coherence characteristics of the atomic defect, low defect concentrations, and quality of the host when combined should allow for long spin coherence and efficient optical transitions, properties also ideal for nanoscale sensing under ambient conditions. 3 Several such atomic defect systems, including vacancy centers in diamond [4][5][6][7][8][9] and silicon carbide 10 have been extensively studied for several applications, in particular quantum networks, 11,12 magnetometry and nanoscale sensors of magnetic fields, 3,13,14 electric fields, 15 temperature 16,17 or chemical composition using NMR, 18,19 often under ambient conditions at room temperature. 20 Another well-known class of defects is transition metal and rare earth ion impurities. These defects have been extensively explored in the context of solid-state laser development, in materials where extremely high doping concentrations are possible. However, exploring single defects as qubits is a more recent development, 21,22 partially facilitated by integration with nano-photonic circuits to modify and enhance their luminescence. 23,24 Although significant attention has thus far been paid to a handful of materials and defects, they represent only a small fraction of the greater body of potential defect-host systems. As most known quantum information materials systems have been discovered indirectly, a rational search of inorganic materials can yield candidates with superior properties.
Recent work has focused on ab initio predictions of host-defect systems that can be carried out with electronic structure calculations, as recently demonstrated with vacancy centers in silicon carbide, 25 and such calculations have been carried out for several individual host-defect combinations. An alternative approach is to systematically screen for potential host materials by first postulating which properties would be ideal for host material candidates, and then screen materials based on those properties. This is done most efficiently through a computational search that substantially narrows the field of potential candidates.
In order to achieve the long spin coherence and high efficiency optical transitions necessary for a usable quantum information system, host substrates must be highly pure (i.e. as free from defects as possible), intrinsically diamagnetic (thereby reducing magnetic noise and spin-based relaxation of the defect quantum state), and possess a band gap large enough to accommodate the ground and excited energy levels of the defect (separated by an optical frequency). The ideal, pure substrate would be free of paramagnetic impurities or unwanted defects that may influence the band gap character and/or a magnetic or electric field environment surrounding the implanted defect, reducing efficiency. Those materials known to be dopable, to possess controllable surfaces, and to possess a known method of epitaxial thin-film production would further facilitate the production of integrated devices. Reducing magnetic noise and relaxation in the environment of the defect within the host requires both the minimization of paramagnetic centers in the host and the absence of host nuclei with non-zero magnetic moments. Consideration of the natural abundance of zero nuclear spin isotopes of each element, shown in Figure 1, allows for the exclusion of a significant portion of the periodic table, for which no stable spin-zero isotopes exist. While some elements, such as O and Ca, are known to exist almost exclusively as nuclear-spin-free isotopes, elemental species with at least 50% nuclear-spin-free isotopes could likely be isotopically enriched to higher concentrations, as has been achieved in diamond 26,27 and silicon 28 . Although several of the lanthanide elements appear to have relatively high natural abundances of spin-free isotopes, the difficulty in obtaining pure, diamagnetic starting material of these (free from magnetic lanthanide impurities) excludes them from consideration here. All transition metal elements with unpaired electrons are eliminated due to their paramagnetism. As the optical coherence of defects may be sensitive to the presence of permanent electric dipole moments, and thus also to their site symmetry, phases crystallizing in polar space groups were also not considered.
In the current work, a systematic application of each of the criteria outlined above was performed, beginning with the totality of known inorganic phases listed in the Materials Project database. As the practice of database mining is often highly automated, the sensitivity of the desired properties to the exact phase and crystal structure reported for each potential host species necessitated manual checks after each restriction of the data set. While the true number of viable quantum host-defect pairings may be effectively infinite due to the fact that different quantum defects will be suitable for different applications, of the 100,000+ inorganic materials in the inorganic crystal structure database (ICSD), a maximum of 580 phases were found to be potentially viable hosts for atom-like defect quantum information systems. Among these, a small number are simple single elements or binary compounds, making them relatively straightforward to prepare and study as substrates for QIS.

Results and Discussion
To reduce the possibility of accidental exclusion of viable host materials, the screening of the Materials Project's 125,223 inorganic compound entries was conducted in four distinct stages, as shown in Figure 2. Four-tier screening process to evaluate the potential viability of materials as hosts for quantum defects. Stage 1) Beginning with all known experimental inorganic phases, remove those with calculated non-zero net magnetic moment, crystallization in polar space groups and/or containing atomic species with <50% I=0 isotopes. Stage 2) phases containing radioactive Th and U, toxic Cd and Hg, and the often magnetically impure rare earth elements were removed. The one phase containing a noble gas element that passed this sieve, orthorhombic XeO 3 , was also removed due to its known instability under standard conditions. Stage 3) The calculated, predicted, and measured experimental band gaps of the remaining phases were recorded from the Materials Project, the JARVIS-DFT database, the AFLOW repository, the AFLOW-ML API, and the existing literature. Those materials that could not be reasonably predicted to have a band gap larger than 1.1 eV were dismissed. Stage 4) The intrinsic diamagnetic (DM) character and phase stability under standard conditions of the remaining candidates were confirmed using a combination of standard electron-counting principles and literature reports. All database searches were conducted in February and March of 2020. respective crystal structure. To calculate band structures for these materials, the Generalized Gradient Approximation (GGA) functional is applied to the relaxed structures. For structures containing one of several transition metal elements such as Cr, Fe, Mo, and W, the +U correction is also applied to correct for correlation effects in occupied d-and f-orbital systems that are not addressed by pure GGA calculations. 32 The authors caution that due to the high computational cost of more sophisticated calculation methods, those employed often produce severely underestimated band gaps relative to the experimentally-derived values. 33 , originally compiled as a database for functional materials, with a focus on the discovery of novel two-dimensional systems, also employs the DFT-based VASP software to perform a variety of material property calculations. As opposed to the Materials Project, JARVIS-DFT employs the OptB88vdW (OPT) functional, which was initially designed to better approximate the properties of two-dimensional van der Waals materials, and has since also been shown to be effective for bulk systems. 34,35 Structures are first sourced from the Materials Project database, and then re-optimized using the OPT functional. A representative band gap is then calculated through a combination of the OPT and modified Becke-Johnson (mBJ) functionals. The mBJ and combined OPT-mBJ functionals have both been shown to predict band gap sizes with more accuracy than other DFT-based calculation methods. 36 The AFLOW 37 repository relies on a highly-sophisticated and automatic framework for the calculation of a wide array of inorganic material properties. 38 The GGA-based Perdew-Burke-Ernzerhof (PBE) functional with projector-augmented wavefunction (PAW) potentials is first used within the VASP software to twice relax and optimize the ICSD-sourced structure using a 3,000-6,000 k-point mesh. The increased k-point mesh density, compared to that employed by the Materials Project, is indicative of a more computationally-expensive calculation.

JARVIS-DFT
The band structure is then calculated with an even higher-density k-point mesh, as well as with the +U correction term for most occupied d-and f-orbital systems, and the standard band gap (E gap ) is extracted. 39 A "fitted" band gap (E gfit ) is then calculated by applying a standard fit, derived from a selective study of DFT-computed vs. experimentally-measured band gap widths, to the initially calculated value. 40 41 , a machine learning API designed to predict thermomechanical and electronic properties based on chemical composition alone, further builds upon the entries present in the ICSD and calculated through the AFLOW framework. Using only provided atomic compositional and positional information, so-called "fragment descriptors", the system first applies a binary metal/insulator classification model. For materials predicted to be insulators, an additional regression model is applied to predict the band gap width. Each model was subjected to a fivefold cross validation process, in which it was trained to more accurately predict properties in an independent data set. The initial binary classification model is shown to have a 93% prediction success rate, with the majority of misclassified materials being narrow-gap semiconductors. While the accuracy of the predicted band gap sizes relative to experimental values is not mentioned by the authors, roughly 93% of the machine learning-derived values are found to be within 25% of the DFT+U-calculated gap width. Only those phases identified in the authors' initial cross-validated test set were used for comparison.

AFLOW-ML
4) The final stage. The criteria by which potential host species were excluded in the first three stages were largely derived computationally, with subsequent manual checks. In contrast, the final stage of screening involved the confirmation of various of the fundamental properties of the materials found in the existing literature.
First, for those remaining stoichiometries for which multiple phases appeared to be viable, the relative stability of each polymorph at STP was recorded. All recognized high-pressure and high-temperature phases that were not reported to be quenchable to a stable state under standard conditions were removed.
The intrinsic magnetic character of each pure phase was then confirmed through a combination of standard electron-counting rules and literature reports. As many of the potentially viable materials contained oxygen, particular care was taken to consider whether reported paramagnetic character could be due to oxygen-defects, especially in the various molybdate, platinate, and palladate phases. Any pure phase that was reported to deviate from diamagnetic character was removed. Where the true size of a material's band gap could not be reasonably assumed suitable based on calculations alone, other reported computationally and experimentally-derived band gap values were also taken into consideration when available. While experimental band gaps were either not available or not recorded for the majority of phases considered, those that were available were compared to the calculated values from each database, in order to better judge the viability of the selected materials with small calculated gaps ( Figure 3). The standard band gap calculations performed by the Materials Project, JARVIS-DFT, and AFLOW databases were found to be underestimated by roughly 40-50%, relative to measured values, consistent with the long-known inaccuracies of DFT estimates. 42 AFLOW-ML's machine learning-based predictions were underestimated to a similar degree. On average, the OPT-mBJ hybrid functional employed by the JARVIS-DFT database for some phases was found to reduce this underestimation to 18.4%, while the "fitted" band gap calculated by AFLOW was actually observed to be overestimated by about 1.8% relative to experimental values. However, the latter figure is heavily influenced by several outlying underestimations in the AFLOW "fitted" gap data set, suggesting that on average, band gaps calculated in this manner will be overestimated to a greater degree. The determination of band gap suitability was thus made with these findings in mind.
It should be noted that despite appearing in the literature, the most stable phases of several, long-recognized materials were found to lack an experimental crystal structure in the ICSD, and subsequently were also absent from the Materials Project database. Several of these, such as the STP-stable phases of BaGeO 3 , BaGe 2 O 5 , and C 70 would be potentially viable host species, but due to their absence from any of the databases studied are not included. However, while comparing computed band gaps with reported literature values, three additional phases were identified that lacked a corresponding Materials Project entry, but did appear in at least one of the other databases. fcc-C 60 is the only included phase with a corresponding ICSD entry that did not appear in any of the databases considered.
While suitably-sized single crystals are necessary for the fabrication of functional devices for quantum information systems, few materials have been reported to be easily grown as large, defect-free, and optically clear single crystals. As such, the existence of published single crystal synthesis, regardless of product size, is denoted in the "SC" column in the tables of results by an asterisk. Noted air and moisture instability in the literature was also considered, but was not exclusionary, as additional post-synthesis processing may allow otherwise viable host materials to be utilized. 46% of all phases in the International Crystal Structure Database were found to be potentially viable host substrates, with the most substantial reduction in candidate phases (97.3%) occurring in Stage 1 of the screening process, b) Color-coded periodic table of constituent elements in viable host species depicting elements: disregarded due to the absence of spin-zero isotopes (gray); deemed too hazardous, too radioactive, too difficult to purify, and too unreactive for inclusion (red); not present in any identified phases (yellow); present largely only in cluster complex phases (blue); and present in numerous identified phases (green).
Of the 125,223 inorganic compounds listed in the Materials Project database, a maximum of 580 phases (0.46%) are found to fit the criteria outlined above as suitable hosts for quantum defects ( Figure 4a). While there likely exist significantly more potentially viable phases that have yet to be recognized, this total spans all currently reported experimentally synthesized and theoretically predicted inorganic materials. As the ease of growth of suitably-sized single crystals, as well as stability under standard or near-standard atmospheric conditions, and chemical and structural simplicity are major factors in determining the viability of potential host phases, it is likely that those identified here represent the majority of suitable candidates.
Of the 580 identified phases, the band gap character of at least 560 could be either confirmed or reasonably assumed to be comparable or larger than that of silicon (≥1.1 eV), based on calculated and/or reported experimental values. The band gaps of the remaining 20 candidates, grayed in Tables 1-4, have either been reported to be slightly smaller than that of Si, or could only be tentatively extrapolated to a similar value.
Only one admitted element, Ni, was not found to be present in any of the listed phases, as all Ni-containing materials considered were found to be intrinsically paramagnetic. All the Cr-, are quaternary or higher. Many are oxides and chalcogenides, suggesting that they will be relatively simple to fabricate and study, while others in the tabulation, such as the osmates and higher order elemental clusters, will likely prove impractical for currently considered device applications. Future studies concerning inorganic material systems would likely also benefit from the implementation of similar systematic screening processes, enabling the rapid and conclusive determination of all viable phases for any number of applications. Table 1. The potentially viable unary host materials. All band gaps are reported in eV. Phases reported in the JARVIS-DFT database for which both standard OPT and hybrid OPT-mBJ calculations were performed are listed with the OPT-mBJ listed first, followed by the smaller OPT band gap in parentheses. Where the OPT-mBJ-computed band gap was found to be unrealistically small, compared to the basic OPT-calculated one, only the OPT-computed band gap is listed. Where multiple AFLOW repository entries were found to exist for the same phase of a compound, the average of each computed band gap size is listed. Literature band gaps listed in bold correspond to those experimentally-derived, while those not bolded correspond to reported, alternatively calculated band gap widths, many of which are seen to agree with the database DFT-calculations. The presence of * in the SC column denotes a known single-crystal synthesis method for that particular phase. (-) in any cell corresponds to uncalculated and unreported values for that particular phase, respectively.  Table 2. The potentially viable binary host materials. All band gaps are reported in eV. Phases reported in the JARVIS-DFT database for which both standard OPT and hybrid OPT-mBJ calculations were performed are listed with the OPT-mBJ listed first, followed by the smaller OPT band gap in parentheses. Where the OPT-mBJ-computed band gap was found to be unrealistically small, compared to the basic OPT-calculated one, only the OPT-computed band gap is listed. Where multiple AFLOW repository entries were found to exist for the same phase of a compound, the average of each computed band gap size is listed. Literature band gaps listed in bold correspond to those experimentally-derived, while those not bolded correspond to reported, alternatively calculated band gap widths, many of which are seen to agree with the database DFT-calculations. The presence of * in the SC column denotes a known single-crystal synthesis method for that particular phase. (-) in any cell corresponds to uncalculated and unreported values for that particular phase, respectively. Grayed text indicates materials where the band gap is considered to be near enough to that of elemental Si to be potentially viable.  Table 3. The potentially viable ternary host materials. All band gaps are reported in eV. Phases reported in the JARVIS-DFT database for which both standard OPT and hybrid OPT-mBJ calculations were performed are listed with the OPT-mBJ listed first, followed by the smaller OPT band gap in parentheses. Where the OPT-mBJ-computed band gap was found to be unrealistically small, compared to the basic OPT-calculated one, only the OPT-computed band gap is listed. Where multiple AFLOW repository entries were found to exist for the same phase of a compound, the average of each computed band gap size is listed. Literature band gaps listed in bold correspond to those experimentally-derived, while those not bolded correspond to reported, alternatively calculated band gap widths, many of which are seen to agree with the database DFT-calculations. The presence of * in the SC column denotes a known single-crystal synthesis method for that particular phase. (-) in any cell corresponds to uncalculated and unreported values for that particular phase, respectively. Grayed text indicates shows materials where the band gap is considered to be near enough to that of elemental Si to be potentially viable.  Table 4. The potentially viable higher-order host materials. All band gaps are reported in eV. Phases reported in the JARVIS-DFT database for which both standard OPT and hybrid OPT-mBJ calculations were performed are listed with the OPT-mBJ listed first, followed by the smaller OPT band gap in parentheses. Where the OPT-mBJ-computed band gap was found to be unrealistically small, compared to the basic OPT-calculated one, only the OPT-computed band gap is listed. Where multiple AFLOW repository entries were found to exist for the same phase of a compound, the average of each computed band gap size is listed. Literature band gaps listed in bold correspond to those experimentally-derived, while those not bolded correspond to reported, alternatively calculated band gap widths, many of which are seen to agree with the database DFT-calculations. The presence of * in the SC column denotes a known single-crystal synthesis method for that particular phase. (-) in any cell corresponds to uncalculated and unreported values for that particular phase, respectively. Grayed text indicates materials where the band gap is considered to be near enough to that of elemental Si to be potentially viable.