CAVD, towards better characterization of void space for ionic transport analysis

Geometric crystal structure analysis using three-dimensional Voronoi tessellation provides intuitive insights into the ionic transport behavior of metal-ion electrode materials or solid electrolytes by mapping the void space in a framework onto a network. The existing tools typically consider only the local voids by mapping them with Voronoi polyhedra vertices and then define the mobile ions pathways using the Voronoi edges connecting these vertices. We show that in some structures mobile ions are located on Voronoi polyhedra faces and thus cannot be located by a standard approach. To address this deficiency, we extend the method to include Voronoi faces in the constructed network. This method has been implemented in the CAVD python package. Its effectiveness is demonstrated by 99% recovery rate for the lattice sites of mobile ions in 6,955 Li-, Na-, Mg- and Al-containing ionic compounds extracted from the Inorganic Crystal Structure Database. In addition, various quantitative descriptors of the network can be used to identify and rank the materials and further used in materials databases for machine learning.


Introduction
All-solid-state batteries are the promising candidates for electric vehicles 1 , smart grid 2 and other electrochemical power sources, due to their high safety, long cycle life and high energy density [3][4][5][6] . One of the prerequisites for both inorganic solid electrolytes and electrodes is high ionic conductivity, which requires a connected mobile ions pathways. Currently, widely used computational approaches to predict or simulate ionic conduction pathways include both ab initio and empirical methods. The methods based on the first-principles calculations include ab initio molecular dynamics [7][8][9] and the first-principles nudged elastic band method (FP-NEB) 10 . They provide high accuracy but they are limited by high computational cost and often involve complicated manual pre-and post-processing of the input data and results. At the opposite side of the spectrum, the low-cost empirical methods, such as geometric analysis [11][12][13][14][15] and bond valence method (BV) [16][17][18][19][20][21] , have gained popularity in high-throughput screening [22][23][24][25][26][27][28][29] and obtaining preliminary insights quickly for further accurate calculations. In this context, we have developed a high-throughput screening platform 30 that integrates material database with hierarchical ion-transport calculations realized by implementing empirical algorithms to assist in FP-NEB calculation. To meet the requirements of our platform for automated unsupervised workflow, we also independently developed the bond valence site energy (BVSE) calculation program (has been reported in our paper 31 ) and the geometry-based ion-transport analysis library CAVD (presented in this paper).
In the geometric model considering atoms as spheres of characteristic radii, the crystal space can be divided into two complementary subspaces: the subspace occupied by atoms and that of interatomic voids 32 . The atom subspace can be characterized by the composition, symmetry, topology, and geometry, which enables efficient classification of structures. On the other hand, the analysis of the void subspace of crystal structures allows to

Methods
The Voronoi decomposition has been widely used to construct a graph representation of the void space for a given arrangement of atoms in a periodic domain 37,38,46,47 . The resulting Voronoi diagram is composed of Voronoi cells, characterized by vertices, edges and faces. The Voronoi faces are the perpendicular planes between two neighboring framework atoms, and the vertices and edges of the Voronoi cell are by definition at the largest possible distance from the framework atoms around them. The Voronoi network, built of Voronoi vertices and edges, maps the void space in the framework. Based on these features, the procedures of our method are summarized as following ( Fig. 1): (1) Perform Voronoi decomposition of the void space in the framework; (2) Define an interstitial network for ion transport analysis; (3) Define and calculate ionic transport descriptors.
Voronoi algorithms for void space characterization. The standard Voronoi decomposition proposed by Georgy Voronoi 48 is a mathematical approach to dividing space 38,49 . In a finite-dimensional Euclidean space Z, if the Euclidean distance between generator (e.g., point, circle or sphere) p and p i (i = 1,2,…,n) is denoted by d(p, p i ), the standard Voronoi cell 49 corresponding to p i satisfies Fig. 1 The class diagram of objects that are involved in our tool.
is thus a division of space Z. Each Voronoi cell is a convex polyhedron, which is obtained by constructing the perpendicular bisector (for monodisperse system) planes between two neighboring generators. Replacing the generators with ions, the Voronoi cell can be used to describe the local environment in an ionic crystal structure. The Voronoi vertices can be used to describe the sites of local largest voids, and the Voronoi edges refer to the least restrictive paths from one local largest void to the next because they are farthest away from the ions around themselves.
However, the standard Voronoi cell constructed by bisecting the interatomic distances is not adequate for structures with atoms of different size (see O V in Fig. 2). The difference in radii can be taken into account in a so called Voronoi S decomposition 50-52 , as following: However, the curved boundaries (see O S in Fig. 2) of the Voronoi S cell obtained by Eq. (2) are computationally difficult to handle, and the radical Voronoi decomposition 40,53-56 is chosen as a compromise between the accuracy and efficiency 57,58 : The radical Voronoi cell constructed by radical Voronoi decomposition has the following three main features: (1) The radical Voronoi cell is a convex polyhedron.
(2) The overall shape of radical Voronoi cell is similar to Voronoi S cell.
(3) The radical Voronoi cell will degenerate into standard Voronoi cell when r i = r j .
The radical Voronoi decomposition is easy to implement and provides more adequate description of the void space than standard Voronoi decomposition for a structure with atoms of unequal radii. Although the vertices of the radical Voronoi cell may no longer precisely coincide with the centers of local maximally large interstices (see O R in Fig. 2), previous studies 40,57,58 and our derivation (S1 of the Supplementary Information) demonstrates that it has little effect on ionic transport analysis.
Interstitial network for ionic transport analysis. For the analysis of ionic transport, it is necessary to characterize the void space formed by the framework atoms and then determine the subspace which mobile ions can access 33,59 . The Voronoi network, which maps the void space in the packing of framework atoms by Voronoi decomposition, identifies the interstices and the channel topology. Since the Voronoi vertex may be expected to be a local low-energy position (since it maximizes the distance to its coordinating atoms) 60 , the potential mobile ion diffusion path can be further evaluated by determining the parts of this network through which the mobile ions could theoretically migrate. A set of local geometrical descriptors, r SD , r Chan 13,23 and G 3 61-63 has been introduced for the evaluation of ionic transport in previous work. r SD is the radius of a spherical domain whose volume equals to the secondary Voronoi cell around the vertices of the primary Voronoi cell. r Chan describes the channel size, which is the radius of a circle passing through the bottleneck area constituted by three framework atoms. G 3 is the second moment of inertia of the Voronoi cell, which is necessary to filter out unreasonable r SD values caused by the severe distortion of the secondary Voronoi cell. However, for these local descriptors, Voronoi decomposition needs to be performed twice, and the value of r Chan is affected by the distortion of the bottleneck region. Moreover, two different criteria are required to determine the accessibility of the interstices and channels. To meet the goal of making the calculations automatic and robust, we introduce the distance from the interstice/channel to the surface of the nearest framework atom as the weights for the evaluation (see S2 of the Supplementary Information for details). After that, the accessible subspace of the void space can be easily determined by comparing the interstitial/bottleneck size in the network with the statistical values for mobile ions in different coordination environments (see subsection "Ionic transport descriptors").
The resulting weighted Voronoi network WVN = (V, E) is comprising a set V of interstices together with a set E of channel segments, which are 2-element subsets of V. Only with all the sites of mobile ions matched with the interstices that have been included in the WVN, can we get all migration pathways between them. However, in some structures, we cannot find the interstices contained in the network that correspond to the lattice sites of target ions. For example, the lattice positions of Li1 and Li2 in α-Li 3 N do not coincide with any Voronoi vertices. Li1 locates at the center of the Voronoi face, and Li2 occupies the bottleneck site of the channel connection two adjacent interstices ( Fig. 3(a)). This is an example of the situation when a 3D Voronoi polyhedron degenerates into a 2D polygon, which, similarly to channel size, should be neither too wide nor too narrow 26,36,64,65 . Therefore, we extend the set V to include the interstices that coincide with the center of Voronoi faces, and place the connections of the centers and the vertices of the Voronoi faces into the set E. The acquired new interstitial network ( Fig. 3(b)) is characterized by the position and size of interstices, channels between interstices, and bottlenecks of the channels which are used to determine possible migration pathways.

Ionic transport descriptors. A descriptor, an effective input data representation for material character-
istics, is critical to study structure-property relationships 66 . Some efforts have been made in the development of material descriptors based on Voronoi decomposition. With the help of Voronoi node and its coordinating host-lattice atoms, Kahle et al. 60 introduced a landmark vector to perform a site analysis of molecular dynamics trajectories for ionic diffusion analysis. Based on Voronoi-tessellation real feature values and atomic property data, Jalem et al. 67 proposed a general representation scheme for crystalline solids, which showed good predictive power and generalization performance in the density functional theory based calculation of material properties. Yang et al. 68 defined a function to quantify structure similarity and predict populated crystal sites using both Voronoi cell and chemical composition. In these schemes, descriptors are all based on the basic geometric/topological features contained in the Voronoi cell and the problem-specific domain knowledge (chemical/atomic information). In this work, instead we focus on more general geometrical/topological descriptors that are frequently used in ionic transport.
For the beginning, the criteria for ion accessibility need to be determined. To quantify such criteria, distances from the sites of mobile ions sublattice to their nearest immobile framework ions surface were calculated in 12,448 coordination environments. A kernel density estimate plot (Fig. 4, S3 of the Supplementary Information) is formed by computing a continuous probability distribution estimate (using Gaussian kernels with 1000 equally evaluation points) of the minimal distances. The calculated distances are distributed within a well-defined range, so the lower threshold (T l ) and upper threshold (T u ) of ion migration can be easily determined. In order to cover as many samples as possible and avoid excessively long distances with high coordination numbers, the thresholds are obtained from about 90% of the data (Table 1) and then used to determine the mobile ions accessibility www.nature.com/scientificdata www.nature.com/scientificdata/ of the voids in Li-, Na-, Mg-and Al-containing compounds. If the size r of the interstice or bottleneck satisfies T l ≤ r ≤ T u , it is considered to be accessible.
Based on the T l and T u , the basic geometrical/topological descriptors for transport channel can be acquired, including the connected transport channel (TC), the interstitial network dimension (IND), the radius of largest ion that can freely pass through the void space (R T ), the critical radii for main crystallographic directions a (R Ta ), b (R Tb ) and c (R Tc ).
TC: The transport channel is formed by connected interstices and channel segments whose size is within [T l , T u ]. The TC characterizes the continuous subspace of the void space accessible by mobile ions [12][13][14] and is useful for identifying the approximate minimum energy path of ion migration 29,69 Each interstice that belongs to the transport channel should have the pathways to its periodic images in adjacent cells. The algorithm 22 (shown in Fig. 5) for calculating the transport channel is described as following: (1) Calculate the interstitial network of the framework ions based on the input structure; (2) Remove all interstices and channel segments which sizes are out of [T l , T u ]; (3) Select an unrecorded interstice, then record its periodic displacement vector as (0, 0, 0), and push all interstices that directly connected to the interstice into a stack. When the stack is not empty, remove the topmost interstice and performs following analysis: (a) If the interstice has not been recorded, record its periodic displacement vector and push all directly connected interstices into the stack; (b) If the interstice has been recorded with other periodic displacement vectors, all its connected interstices constitute a transport channel; (4) Repeat step (3) until all interstices have been recorded.
It should be noted that there may be more than one transport channel in a structure, and all of them form a transport network. R T : The size of restricting interstice or bottleneck in the interstitial network. R T , characterizes the radius of largest ion that can freely pass through the void space. This descriptor is analogous to the radius of largest free  www.nature.com/scientificdata www.nature.com/scientificdata/ sphere (RLFS), a currently widely used description of the pore geometry 22,[70][71][72] in crystalline porous materials. The RLFS refers to the maximum size of a spherical probe that can travel through the void space in a structure by at least one periodic lattice translation, and can be calculated for three crystallographic directions. Taking the same approach, we define R T as the size of restricting interstice or bottleneck in the interstitial network to characterize the ion transport. The R T for three crystallographic directions a (R Ta ), b (R Tb ) and c (R Tc ) are also obtained, and R T = max{R Ta , R Tb , R Tc }.

IND:
The interstitial network dimension. The IND provides the information about the dimensionality of the network that mobile ions can diffuse through in the void space [12][13][14] . The interstitial network may be one-(1D), two-(2D) or three-dimensional (3D) or otherwise, zero dimensional (0D) if no infinite channels are found. The value of IND is determined from R Ta , R Tb and R Tc with T l . For example, if R Ta larger than (or equal to) the T l , the interstitial network will be called connected in the a direction, and if only one crystallographic direction connected, the system will be called 1D.

Results and Discussions
To support our work, 31,499 Crystallographic Information Files (CIFs) 73,74 for Li, Na, Mg and Al containing compositions were extracted from the Inorganic Crystal Structure Database (ICSD; release 2010/2) 75 . For these compounds, the repairable format errors (such as inconsistent bracket and excessive blank lines) were corrected at first. The data cleaning procedure (e.g., removal of elements, alloys, and entries with hydrogen and H isotopes, or errors in the atomic sites), reduced the set to 17,206 entries. After removing entries with duplicated and www.nature.com/scientificdata www.nature.com/scientificdata/ undetermined atom sites, 6,955 entries having Li (1,920 items), Na (2,841 items), Mg (1,125 items) and Al (1,743 items) were selected with fully occupied metal sites whose positions are accurately determined. The detailed process of data cleaning is described in S4 of the Supplementary Information.

Methodology validation.
Since one of the main purposes of our tool is to identify pathways for further FP-NEB calculations, the prerequisite for the CAVD analysis is that all experimental lattice sites of mobile ions are matched to the interstitial network. As a first step of the methodology validation, we assess the ability of the calculations to recover the positions of the mobile ions for all 6,955 compounds using only framework structure information. For each compound, we remove mobile ions, then calculate the interstitial network (IN) using our approach as described in section "Methods", and standard Voronoi network (VN) (for comparison) for the frame- www.nature.com/scientificdata www.nature.com/scientificdata/ We classify the recovery rate of every compound into three types: complete recovery (P = 1), partial recovery (0 < P < 1), and a miss (P = 0). The recovery rates of all 6,955 compounds for IN and VN are presented in Table 2. It can be clearly seen that our model performs well in characterizing mobile ion sites with a high rate of complete recovery and very low miss rate. The performance of the standard Voronoi network is slightly inferior. Upon investigation, we found that mobile ions in these compounds lie on the Voronoi faces and thus cannot be matched by any Voronoi vertices. In contrast, in our model we consider the centers of the Voronoi faces, the positions closest to an atom on the Voronoi edges as well as the Voronoi vertices as possible mobile ions locations. Therefore, our model has a higher recovery rate than the standard model. We also notice that in ~1% of the 6,955 compounds which were only partially recovered by our model ( Table 2), some of the distances d(m, v) are just slightly larger than the threshold 0.5 Å. If the threshold is increased, those compounds are also completely recovered (S5 of the Supplementary Information). All of these discussions clearly demonstrate that our method successfully identifies mobile ion sites and the channels between them.
Calculated results. To further validate the developed code, we calculated the ionic transport descriptors for selected 6,995 compounds as shown in the supplementary files (in figshare). The calculated transport channel for each CIF is provided by *.vesta files and *.net files (archived in Channels.zip). The remaining numeric descriptors are recorded in Descriptors.xlsx, and a short description of all data is given in S6 of the Supplementary Information.   Table 3. Calculated descriptor values of 17 well-known ionic conductors. § R Tc ≪ R Ta = R Tb effectively makes the material as 2D conductor in agreement with reports in 15,78,79 . www.nature.com/scientificdata www.nature.com/scientificdata/ The results, including 17 well known ionic conductors, including Li-nitride, NASICON, Garnet, LIPHOS, perovskite, and β-Al 2 O 3 , are consistent with previous reports (shown in Table 3). In order to demonstrate the features of our method, four well-studied materials are discussed in detail. Na 4 Zr 2 (SiO 4 ) 3 and Li 7 La 3 Zr 2 O 12 with 3D transport channel are presented in this section, LiFePO 4 with 1D channel and Na 2 O(Al 2 O 3 ) 11 with 2D channel are shown in the S7 of the Supplementary Information.

Mobile ion Total
The first material is a NASICON (Na superionic conductor) type compound Na 4 Zr 2 (SiO 4 ) 3 80 . It is a good ion conductor due to the network of wide channels in the three-dimensional rhombohedral R-3c framework of corner sharing ZrO 6 octahedra and SiO 4 tetrahedra (Fig) [80][81][82] . Mobile Na ions are distributed over the two types of interstices: Na1 (in an octahedral oxygen environment at the intersection of three transport channels) and Na2 (in an 8-10 oxygen environment at each bend of the conduction channels) and the transport involves the Na1-Na2-Na1 migration.
Before applying our method to Na 4 Zr 2 (SiO 4 ) 3 , all Na ions are removed from the structure and the following steps are carried out to locate the transport channels: (1) Calculate the interstitial network based on the framework structure; (2) Calculate the connected transport channels by applying the geometric transport analysis algorithm to the interstitial network; (3) Simplify the transport channels by identifying crystallographically equivalent interstices; (4) Identify the experimental sites, and the pathways between them form simplified channels.
All of the steps are done by the code automatically and no manual processing of the interstitial network is done by hand. The procedure yielded a 3D Na + network consisting of 13 symmetrically distinct interstices and 24 channel segments between the interstices (see Table S4 of the Supplementary Information). In this network, the centers of It6 and It9 coincide with the experimental equilibrium sites Na1 and Na2, respectively. The paths between Na1 and nearest Na1 is Na1(It6)-It7-Na2(It9)-It7-Na1(It6) (Fig. 7), which is consistent with previous reports 80,81 . Moreover, an interface is provided to automatically generate a series of images along the migration path and the POSCAR files corresponding to images for further FP-NEB analysis.
Based on the results of these tests, we are confident that our model has the ability to identify all possible channels between the mobile ions. The results can be further combined with BVSE calculations to identify the minimum energy paths as illustrated by the BVSE energy profiles of the pathways Li3-Li3 in Li 7 La 3 Zr 2 O 12 performed Fig. 7 The 3D transport channel for Na + in Na 4 Zr 2 (SiO 4 ) 3 (icsd_38055). With the help of step (2), the Na + transport channel with 402 interstices is obtained from the interstitial network with 1,182 interstices. These 402 interstices are then merged with symmetry equivalent (step (3)), and 13 symmetrically distinct interstices are obtained. The symmetrically distinct interstice and bottleneck are labeled as It and Bn, respectively. All interstices and channel segments in channel (see VestaFiles/icsd_38055.vesta in figshare) can be accessed by Na + , but some of them are omitted in the figure for clarity. (2020) 7:153 | https://doi.org/10.1038/s41597-020-0491-x www.nature.com/scientificdata www.nature.com/scientificdata/ using our joint calculation program 31 . As expected, we find the interstices in paths Li3-Li1-Li3 and Li3-It11-Li3 are distributed at the valley of energy profile, and the bottlenecks coincide with the peak (Fig. 9). As the abnormal path Li3-It18-Li3 in the energy profile, it seems to be hard to transport because of the high energy barrier. Along this path, both It18 and Bn14 are located at high energy sites. After we analyzed the specific locations of the interstices, we found that It18 coincides with the center of a new tetrahedron sharing edge with the tetrahedron occupied by Li1, and the centers of Bn18 are located on the face of the new polyhedron. In addition, the low energy paths Li3-Li1-Li3 and Li3-It11-Li3 are observed to be similar, where Li1 and It11 are also located in the Li-O tetrahedron. This small difference is caused by the unequal lattice length of the tetragonal-LLZO.
We can summarize that our tool, as the other Voronoi decomposition based codes, offers a very quick way to assess any material as a potential ionic conductor based on the crystal structure information. The suitable geometry of the framework is not a sufficient condition for a very good percolation-type ionic conductivity, but it is definitely a necessary condition and the method can be used to filter out the structures which do not favour ionic transport. Another useful application of the method is that it allows identifying the position of all the segments of the conduction channels that can be used as input for ab initio nudged elastic band calculations.
As all the other Voronoi tessellation codes, our tool has some limitations. The mobile ion is needed to be specified before applying CAVD, but this will become difficult when compound contains two or more kinds of potential mobile ions. Since the radical Voronoi decomposition is sensitive to changes in local structure 54 , the interstitial clusters constructed by Voronoi cell make it difficult to identify of the sites of mobile ions and the There are five interstitial clusters are formed by It10-It12-It17, It7-It18, It6-It16, It9-It15 and It1-It2, some of them are omitted in the figure for clarity. www.nature.com/scientificdata www.nature.com/scientificdata/ symmetry of the interstices. The procedure that simplify the network by merging all interstitial clusters as their geometric center is in progress and will be published elsewhere. Furthermore, ion radius is only one of the factors affecting the mobility of ions in ionic conductors 84 , improvement of the tool with additional criteria, such as formation energies, can make the results more precise and robust; defining a simple, measurable parameter to describe the distribution of framework ions, and establishing the relationship between the parameter and ionic transport will make CAVD-based structure prediction possible 85 .

Conclusion
We developed the Crystal structure Analysis by Voronoi Decomposition (CAVD) tool to characterize the void space of crystal structures as interstitial network by the radical Voronoi decomposition. The obtained interstitial network can be used to calculate crystal structure characteristics relevant for ionic transport properties. We validated our code by predicting the lattice sites of mobile ions in 6,955 compounds extracted from the ICSD with the success rate > 98%. The detailed study of the migration channels in Na 4 Zr 2 (SiO 4 ) 3 and Li 7 La 3 Zr 2 O 12 also confirmed the validity of our method. In addition, 6,955 descriptor data that can be further studied, e.g. material machine learning, structure classification, and structure-property relationships study, are shared. CAVD can be used not only to identify the position of mobile ion sites and conduction channels and quantify the accessible void space, but also to provide insights for BVSE or FP-NEB calculations for the newly established high-throughput screening platform for battery materials (https://www.bmaterials.cn) 30 .

Data availability
The data (all files mentioned in the main text and the Supplementary Information) that support the findings of this study are available from the figshare 86 .