AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes

The accelerated growth rate of repository entries in crystallographic databases makes it arduous to identify and classify their prototype structures. The open-source AFLOW-XtalFinder package was developed to solve this problem. It symbolically maps structures into standard designations following the AFLOW Prototype Encyclopedia and calculates the internal degrees of freedom consistent with the International Tables for Crystallography. To ensure uniqueness, structures are analyzed and compared via symmetry, local atomic geometries, and crystal mapping techniques, simultaneously grouping them by similarity. The software i. distinguishes distinct crystal prototypes and atom decorations, ii. determines equivalent spin configurations, iii. reveals compounds with similar properties, and iv. guides the discovery of unexplored materials. The operations are accessible through a Python module ready for workflows, and through command line syntax. All the 4+ million compounds in the AFLOW.org repositories are mapped to their ideal prototype, allowing users to search database entries via symbolic structure-type. Furthermore, 15,000 unique structures - sorted by prevalence - are extracted from the AFLOW-ICSD catalog to serve as future prototypes in the Encyclopedia.


INTRODUCTION
Scientists have been struggling for decades to identify prototypes (e.g. Strukturbericht series 1 and Pearson's Handbook 2 ) and duplicates in crystallographic databases; and to label structures in a concise way to recognize (and enable searching by) structuretypes. The recent rapid growth of online repositories has worsened the problem 3 . Distinguishing distinct crystalline compounds is becoming increasingly difficult, leading to the repetition of previously studied materials, hindering database varietybiasing data-driven analyses and machine learning methods 4-6and wasting valuable computational and experimental resources. The multitude of crystal geometries make by-hand detection of prototypes and repeated entries intractable. A major complication for finding structure-types is the non-standard representation of crystals. Determination of unique crystallographic structures is obfuscated by (i) unit cell representations and (ii) origin choices. While standard forms existsuch as Niggli 7 and Minkowski 8 unit cellsthe conversion procedures are highly sensitive to numerical tolerance values and can cast similar structures into differing descriptions 9,10 . Additionally, lattice standardization techniques do not address differences in origin choices. The lack of commensurate representations impedes the search for prototypes and inhibits mappings between similar crystals and their corresponding properties. To overcome non-standard descriptions, crystal comparison tools have been developed to identify similar structures. Programs such as Structure Matcher 11 , XTAL-COMP 10 , SPAP 12 , CMPZ 13 , CRYCOM 9 , STRUCTURE-TIDY 14 , and COMPSTRU 15 are available with varying objectives related to structure comparison. For instance, XTALCOMP is coupled with the XTALOPT infrastructure for identifying distinct materials generated via their evolutionary algorithm 16 . Despite the considerable number of platforms, none are suitable for autonomous prototype detection. Crystallographic symmetry is neglected in Structure Matcher, XTALCOMP, and SPAP; while STRUCTURE-TIDY, CRYCOM, and COMPSTRU rely on external symmetry packages. Additionally, most tools only feature single pairwise comparisons (with the exception of Structure Matcher) and others require additional inputs (e.g. space group, Wyckoff positions, and unit cell choice). Aside from technical functionality, the codes do not offer built-in methods to compare structures to existing crystallographic libraries and material repositories. To promote materials discovery, routines must analyze compounds with respect to established prototypes to identify new structure-types. This would enable the expansion of prototype librariessuch as the AFLOW Prototype Encyclopedia (or Prototype Encyclopedia for brevity) 17,18 fueling the generation of unique compounds via prototype decoration. Comparing compounds to those in materials databases can prevent duplication. Moreover, the properties of database entries can be used to estimate those of similar uncalculated compounds, exploiting the structure-property relationship of materials. Clearly, an automatic and reliable large-scale method for discerning unique crystallographic structures is therefore crucial for the materials science community.
AFLOW-XtalFinder (AFLOW crystal finder, XtalFinder for brevity) addresses many of the previously mentioned issues in a highthroughput fashion. The primary objective of XtalFinder is to identify/classify the prototypes of materials and relate them via structural similarity metrics. To accomplish this, XtalFinder determines the ideal prototype designation of crystal structures, consistent with the International Tables for Crystallography (ITC) 19 . Any structure in this representation can be automatically generated via a symbolic prototype generator. Similarity between structures is analyzed on multiple fronts. Crystallographic structures are first compared by symmetry (isopointal analysis), leveraging a robust software implementation, AFLOW-SYM, which calculates self-consistent symmetry descriptions freeing the user from tolerance adjustments 20 . Local atomic geometries are also computed to match neighborhoods of atoms in crystals (isoconfigurational snapshots). Finally, crystal similarity is resolved by rigorous structure mapping procedures (complete isoconfigurational analysis) and quantified via a misfit criterion 21 . The prototype finder accommodates automatic workflows, with the functionality to analyze multiple materials/structures simultaneously via multithreading. Features are provided to identify crystallographic structures, distinct materials, atom decorations, and spin configurations. Methods are also included to compare compounds/prototypes to the AFLOW.org repository and AFLOW prototype libraries. Every entry in the AFLOW.org repository has been mapped to its prototype label, enabling users to search the database by structure-type. The XtalFinder codewritten in C++is part of the AFLOW (Automatic flow) framework [22][23][24][25] and is open-source under the GNU-GPL license. For seamless integration into different work environments, this functionality is accessible via the command-line and a Python module.

RESULTS
Problem of the ideal prototype Prototype structures are generally classified in terms of their symmetry characteristics. For example, the rocksalt prototype has a face-centered cubic lattice and 8 atoms in the conventional cell (i.e. Pearson symbol of cF8), space group Fm3m (#225), and Wyckoff positions 4 a m3m and 4 b m3m. Determining this information for any arbitrary structure is often a challenge: numerical noise in the atomic positions inhibits the detection of crystal isometries, requiring by-hand modification of tolerance thresholds. Furthermore, consistency between real-and reciprocal-space symmetries is often overlooked, and yet it is imperative for reliable ab initio simulations. Thus, accurate prototype detection relies on robust symmetry analyses.
The default symmetry tolerance value for classifying prototypes in XtalFinder is proportional to the minimum interatomic distance (d min nn /100). The tolerance is thus system-specific, and it has been shown to be consistent with experimentally resolved symmetries 20 . Nevertheless, the tolerance can also be adjusted by the user, and is guaranteed to return a commensurate designation due to the adaptive prototype protocol shown in Fig. 1a.

D. Hicks et al.
• sixth field: the Wyckoff letter(s) of the third atomic site, e.g. site C: three Wyckoff positions with letters f, and • seventh field: the Wyckoff letter(s) of the fourth atomic site, e.g. site D: one Wyckoff position with letter f.
The prototype parameters specify the degrees of freedom allowed by the symmetry of the structure. For the esseneite structure, there are 18 parameters: a, b/a, c/a, β, y 1 , y 2 , x 3 , y 3 , z 3 , x 4 , y 4 , z 4 , x 5 , y 5 , z 5 , x 6 , y 6 , and z 6 . The first three variables are the lattice parameterswith b and c represented in relation to athe fourth variable is the lattice angle β, and the subsequent variables are the Wyckoff coordinates (fractional) that are not fixed by symmetry. The sequence of the Wyckoff parameters is based on the alphabetic ordering of the Wyckoff letters, followed by alphabetic ordering of the species. Additional information regarding the label and parameters are discussed in the refs. 17,18 .
Mapping structures into this format characterizes prototypes in a concise and descriptive manner. The representation also easily distinguishes isopointal and isoconfigurational prototypes. Two compounds with similar labels are isopointal (i.e. same symmetry), and are isoconfigurational if their parameters are the same (i.e. equivalent geometric configurations). However, a strict parameter comparison does not distinguish isoconfigurational structures, e.g. parameters may differ by an origin shift. Moreover, the representation reveals the degrees of freedom that can be altered, while preserving the underlying symmetry. This is useful for showing continuous structure transitions within the same symmetry-type and performing symmetry-constrained structure relaxations 26 . Lastly, with this format, structures are now easily regenerated with the AFLOW software.

Symbolic prototype generator
Structures represented in the ideal prototype designation can be created and decorated with any atomic elements via a symbolic prototype generator, enabling automatic materials design. The procedure introduced in refs. 17,18 has been extended to create all possible prototype structures, going beyond those previously described in the Prototype Encyclopedia. Given a crystal's composition, Pearson symbol, space group, and occupied Wyckoff positions, the generator determines the degrees of freedom in symbolic notation (i.e. a, b/a, c/a, α, β, γ, x, y, and z) that must be specified, based on the ITC conventions 19 . Feeding in the ideal prototype label and degrees of freedom to the symbolic generator will produce the corresponding geometry file, substituting the appropriate degrees of freedom with the input values. Prototypes, including those in the Prototype Encyclopedia, no longer need to be tabulated (hard-coded) in the AFLOW software, and are now created on-the-fly. With this prototype generator, AFLOW is capable of creating structures to span all regions of crystallographic space.
Structures are generated with the following prototype command syntax: --proto=label --params=parameter_1, parameter_2,.... Here, the label is the ideal prototype label, e.g. AB_mP4_11_e_e as shown in Fig. 1(b), and parameter_1, parameter_2,... are the comma-separated values for the prototype's degrees of freedom, e.g. 5.586, 0.719, 0.698, 91.992, 0.252, 0.234, 0.751, and 0.261 as shown in Fig. 1b. By default, structures are generated with fictitious species in alphabetical order (i.e. A, B, C, D, etc.). Users can override this order by specifying other permutations after the prototype label (separated by a period), i.e. --proto=label.BAC...; a useful feature for controlling the atomic site decorations. Specific elements can be decorated onto the prototype by appending the element abbreviations to the command in colon-separated alphabetical order, e.g. --proto=label:Ag:Cu:Zr. The generator checks for any inconsistencies with the provided label and/or parameter values, terminating prematurely with a message listing possible fixes to the command. The generator supports multiple geometry file formats, including VASP (POSCAR) 27 , FHI-AIMS 28 , QUANTUM ESPRESSO 29 , ABINIT 30 , ELK 31 , and CIF. Swapping the command --proto=label with --aflow_proto=label, will build an aflow.in file, AFLOW's input file (using a standard set of DFT parameters by default 32 ), automating ab initio simulations of these compounds.
The generator can also print the symbolic representation of the lattice and Wyckoff positions. Adding the option --add_equations to the prototype command returns both a numerical and symbolic version of the geometry file, and the option --equa-tions_only only prints the symbolic version. Symbolic geometry files can be printed with respect to the conventional cell (ITC) or symbolically transformed into the primitive cell (using the SymbolicC++ open-source software 33 ). By default, AFLOW provides the primitive cell, since fewer-atom unit cells are more computationally efficient.
With a robust prototype classifier and generator in place, comparison of prototypes is required to (i) identify unique structure-types and (ii) group similar ones together. The prototype label and parameters alone cannot establish structural similarity due to variations in the choice of lattice and origin, potentially affecting both the label (e.g. Wyckoff letters) and the parameters (e.g. lattice and non-fixed Wyckoff parameters). Therefore, XtalFinder offers three levels of comparison: symmetry, local atomic geometry, and complete crystal geometry. They are described in the following three subsections.
Isopointal structures: compare symmetry Symmetry analyses of crystals are required to identify structures of the same symmetry-type. The isometries of crystals (e.g. rotations, roto-inversions, screw axes, and glide planes) are calculated via the routines of AFLOW-SYM 20 to determine the space group and occupied Wyckoff positions (Fig. 2a). Results from AFLOW-SYM are robust against numerical tolerance issues and are consistent with experimentally determined symmetries in comparison to other symmetry software 20 .
Crystals are isopointal if they have commensurate space groups (equivalent or enantiomorphic pairs) and Wyckoff positions 34 . Wyckoff positions are compatible if they have the same multiplicity and similar site symmetry designations. Due to different setting and origin choices for the conventional cell, a strict site symmetry match is insufficient. For instance, the Wyckoff positions with multiplicity 2 in space group #47 (Pmmm)four 2mm (letters i-l), four m2m (letters m-p), and four mm2 (letters q-t)form a Wyckoff set and are related via an automorphism of the space group operations 19,35,36 . Depending on the assignment of the lattice parameters (a, b, and c) and origin choice, differentand potentially equivalent -Wyckoff decorations are possible. Consequently, XtalFinder tests permutations of the site symmetry symbol to expose positions that may be within the same Wyckoff set. Permuting the site symmetry symbol does not always reveal Wyckoff positions belonging to the same set since the site symmetry may originate from higher point symmetries (see example of space group #66 (Cccm) and Wyckoff positions i and k in ref. 36 ). Nevertheless, Wyckoff positions belonging to different sets cannot be matched, which will be revealed via the geometric structure comparison.
The symmetry calculation is performed automatically, i.e. it does not require input from the user. Options are available to ignore symmetry and force geometric comparison of structures, which can identify crystals associated via symmetry subgroups.
Isoconfigurational snapshots: compare local geometry Beyond isopointal analyses, structures are further compared by inspecting arrangements of atoms, i.e. local atomic geometries. Local geometry analyses have been fruitful in providing structural descriptors and similarity metrics between crystals of different types 37,38 . However, the positions of these environments are often neglected, precluding the determination of one-to-one mappings between similar crystals. Nevertheless, the analysis quickly identifies local geometries and is employed here to analyze structures beyond symmetry considerations (i.e. isoconfigurational versus isopointal 34 ).
Rather than determine the complete local atomic geometry for each atom, XtalFinder builds a reduced representation: The local least-frequently occurring atom (LFA) geometries are computed and compared between structures. An example local LFA geometry (2-D projection) is shown for the quaternary Heusler structure (ABCD_cF16_216_c_d_b_a) 17,18 , highlighting the closest neighbors (via solid lines) for each LFA type to the central Mg atom (purple). Shaded concentric circles indicate the tolerance threshold for capturing atoms in the coordination shell with a thickness of 10% of the distance from the central and connected atom. Local geometry vectors are compared against local geometries in other structures to determine mapping potential. c Two structures (X ref and X test ) are mapped onto one another by expanding X test into a supercell and exploring commensurate lattice and origin choices with respect to X ref . The yellow lattice (highlighted by the green box) is a potential match with X ref . X test is transformed into the new representation ( e X test ), and the structures are quantitatively compared via the misfit criteria. The structures are evaluated via their lattice deviation (ϵ latt ), coordinate displacement (ϵ coord ), and figure of failure (ϵ fail ). Distances between mapped atoms (d map ) that are less than half the atom's nearest neighbor (d nn /2) are accounted for in the coordinate displacement (green dashed lines and arrows), while larger distances are described in the figure of failure (red dashed lines and arrows).
neighborhoods comprised of only the least frequently occurring atom (LFA) type(s). The local LFA geometry analysis provides the connectivity for a subset of atoms (i.e. LFA-type) to discern if patterns are present in both structures, regardless of cell choice and crystal orientation. This description is preferred over the full local geometry because it is i. computationally less expensive to calculate and ii. generally less sensitive to coordination cutoff tolerances. The latter is attributed to the fact that LFA geometries are more sparse.
An example of a local LFA geometry is shown for the quaternary Heusler structure (ABCD_cF16_216_c_d_b_a) 17,18 in Fig. 2b. A local LFA atomic geometry (AG) is a set of vectors connecting a central atom (c) to its closest neighbors: where d min ic is the minimum distance vector to the i atomrestricted to LFA-type(s) onlyand is calculated via the method of images for periodic systems 39 : Here, n a , n b , and n c are the lattice dimensions along the lattice vectors a, b, and c; and x i and x c are the Cartesian coordinates of the i and c (center) atoms, respectively. A coordination shell with a thickness of d min ic =10 captures other atoms of the same type to control numerical noise in the atomic coordinates (a similar tolerance metric is defined in AFLOW-SYM, i.e. loose preset tolerance value 20 ). This cutoff value yields expected coordination numbers for well-known systems and is comparable to results provided by other atom environment calculators 37,38 . If there is only one LFA typee.g. Si in α-cristobalite (SiO 2 , A2B_tP12_92_b_a) 17,18then the distance to the closest neighbor of that LFA type is calculated. If there are multiple LFA typese.g. four for the quaternary Heusler (as illustrated in Fig.  2b)then the minimum distances to each LFA type are computed. The local atomic geometry is calculated for each atom of the LFA type(s) in the unit cell, resulting in a list of atomic geometries ({AG c }). Therefore, α-cristobalite has a set of four Si LFA geometries (one for each Si in the unit cell: {AG Si,1 , AG Si,2 , AG Si,3 , AG Si,4 }) and the quaternary Heusler has a set of four LFA geometries (one for each element type: {AG Au , AG Li , AG Mg , AG Sn }, respectively).
To investigate structural compatibility, local atomic geometry lists for compounds are compared. In general, the local geometry comparisons err on the side of caution. For instance, comparing the cardinality of the coordination is often too strict. Despite a more sparse geometry space, slight deviations in position can move atoms outside the coordination shell threshold, changing the atom cardinality and neglecting potential matches. Local atomic geometries are thus compatible if (i) the central atoms are comparable types (i.e. same element and/or stoichiometric ratio in the crystal), (ii) the neighborhood of surrounding atoms have distances that match within 20% after normalizing with respect to maxðAG c Þ (i.e. the largest distance in the local geometry cluster), and (iii) the angles formed by two atoms and the center atom match within 10°. To further alleviate the coordination problem, an exact geometry match is not required, i.e. some distances and angles are permitted to be missing. Grouping local atomic geometries as compatible is favored to mitigate false negatives for equivalent structures.
Isoconfigurational structures: compare geometric structure To resolve a commensurate representation between two structures for geometric comparison, one structurethe reference X refremains fixed and the other structurethe potential duplicate X testis expanded into a supercell. Lattice vectors are identified within the supercell and compared against the reference structure. For any similar lattices to X ref , X test is transformed into the new lattice representation ( e X test ). Origin shifts for this cell are then explored in an attempt to match atoms. If one-to-one atom mappings exist between the two structures, then the similarity is quantified with the crystal misfit method (see "Quantitative similarity measure" subsection) 21 . Misfit values below a given threshold indicate equivalent structures and the search terminates. Alternatively, misfit values larger than the threshold are disregarded and the search continues until all lattices and origin shifts are exhausted. The procedure is detailed below and an illustration of the process is depicted in Fig. 2c.
The lattice search algorithm begins by scaling the volumes of the unit cells to compare structures with different volumes (an option is available to quantify the similarity between structures at fixed volumes). Once scaled, the routine searches for translation vectors by generating a lattice grid of X test . The size of the grid is defined to encompass a sphere with a radius (r grid ) equal to the maximum lattice vector length of X ref , i.e. r grid max a; b; c ð Þ: Similar to a procedure described in ref. 20 , the necessary grid dimensions are given by the set of vectors perpendicular to each pair of X ref lattice vectors scaled by the grid radius (e.g. . The scaled vectors are then transformed into the lattice basis (L), via n 0 ¼ L À1 n, and the ceiling of the n 0 components indicate the grid dimensions: n a;b;c ¼ ceilðn 0 Þ. The grid dimensions span between − n a,b,c → n a,b,c to account for different orientations/rotations between the structures. To optimize the lattice search, translation vectors are explored in a grid comprised of only the LFA-type in X test (since they are the minimal set of atoms exhibiting crystal periodicity). In addition to verifying crystal periodicity, candidate lattice vectors must be similar to those in the X ref lattice based on (i) lattice vector moduli (Δl), (ii) angles formed between pairs of lattice vectors (Δθ) and (iii) volumes enclosed by three lattice vectors (ΔV). The tolerances values (Δl, Δθ, ΔV) are chosen based on how much the lattices are allowed to differ. If the lattices are significantly different, then the lattice is ignored (see the "lattice deviation" in the "Quantitative similarity measure" subsection). Additionally, as a speed increase, commensurate lattices are sorted by minimum lattice deviation to find matches more quickly. Upon finding a similar cell to X ref , X test is transformed into the new lattice representation e X test and is stored if the representations have the same number of atoms (and types).
For each prospective unit cell, possible origin choices are explored. The origin of X ref is placed on one of the LFA-type atoms, and the origin of e X test is cycled through all atoms of its LFA-type. Given an origin choice, a mapping procedure is attempted for all atoms in the unit cell. The minimum Cartesian distancevia the method of images for periodic systems 39 where n a , n b , and n c are the lattice dimensions along the lattice vectors a, b, and c; and x i and x j are the Cartesian coordinates of the i and j atoms, respectively. Given the set of distances {d ij }, the minimum distance over all j atoms is identified as the mapping distance, i.e. is computed for all i, the following conditions are verified: i. one-to-one mappings (i.e. no duplicate j indices between i indices), and ii. no cross-matching between element types (i.e. cannot map a single element type to multiple types in e X test ). If either condition is violated, the mappings are ignored and the search continues.
Given a successful mapping, the similarity of the two crystals in the corresponding representations are quantified, indicating equivalent or unique structures. If no mapping is found for any lattice and origin choice, then the structures are considered distinct and are not assigned a similarity value.

Quantitative similarity measure
To compare two crystals in a given representation, a method proposed by Burzlaff and Malinovsky is employed 21 . The similarity between structures is quantified by a misfit value, ϵ, which incorporates differences between lattice vectors and atomic coordinates via 21 : The misfit quantity is bound between zero and one: structures with a value close to zero match and those with a value close to one do not match. Special misfit ranges defined by Burzlaff and Malinovsky are adopted here 21 0<ϵ ϵ match : match ; ϵ match <ϵ ϵ family : same family; and ϵ family <ϵ 1 : no match : The "same family" designation generally corresponds to crystals with common symmetry subgroups. Burzlaff and Malinovsky recommend ϵ match = 0.1 and ϵ family = 0.2 based on definitions from Pearson 40 and Parthé 41 . In the "Finding ϵ match : structural misfit versus enthalpy" section, heuristic misfit thresholds are identified based on the allowed maximum enthalpy differences between similar structures. The deviation of the lattices, ϵ latt , captures the difference between the lattice face diagonals of e X test and X ref 21 where f kl and d kl denote the diagonals by adding and subtracting, respectively, the k and l lattice vectors. In the lattice search algorithm, Δl, Δθ, and ΔV tolerances are coupled to ϵ latt , and are tuned to ensure ϵ latt ≤ ϵ family . The coordinate deviationmeasuring the disparity between atomic positions in the two structuresis based on the mapped atom distances (d map i or d map j as computed with Equations (4) and (5)) and the atoms' nearest neighbor distances in the respective e N test and N ref are the number of atoms in the two crystals. If d map < d nn /2, then a "switch" variable n is set to zero and the mapped atom distance is included in ϵ coord . Otherwise, n is set to one, signifying the mapped atoms are far apart and not considered in ϵ coord . These atoms are represented in the figure of failure, ϵ fail Other metrics can be used to assess structural similarity, including the root mean square (rms) of the atom positions 11 and coordination characterization functions 12 . XtalFinder employs the crystal misfit criteria to incorporate structural differences between both the lattice and atom positions. Differences between common similarity metricsand their software implementations are discussed in more detail in the "Comparison Accuracy" subsection.

Super-type comparisons
To explore new areas of materials space, the XtalFinder module (i) identifies equivalent and unique materials, (ii) uncovers common structure-types across different compounds (i.e. prototypes), (iii) determines inequivalent atom decorations for a given crystal structure, and (iv) discerns distinct magnetic structure configurations. The corresponding comparison modes are denoted as material-, structure-, decoration-, and magnetic-type, respectively (Fig. 3). Each variant uses the underlying procedures discussed in the "Results" section (i.e. symmetry, local atomic geometry, and geometric structure comparisons) with different restrictions on mapping atom types.

Material-type
Material-type comparisons map atoms of the same atomic species (Fig. 3a). For example, given two ZnS zincblende compounds (AB_cF8_216_c_a) 17,18 , a material-type comparison maps Zn → Zn and S → S in the two structures. Therefore, the method reveals duplicate compounds within a data set. Available super-type comparison modes. a Material-type: maps same element types, revealing duplicate compounds. b Structuretype: maps structures regardless of the element types, identifying crystallographic prototypes. c Decoration-type: creates and compares all atom decorations for a given structure, determining unique and equivalent decorations (in this case, atom decorations AB and BA match). d Magnetic-type: maps compounds by element types and magnetic moments, discerning distinct spin configurations.

Structure-type
Conversely, structure-type comparisons ignore atomic species and map any atom-type with compatible stoichiometric ratios (Fig. 3b). In the case of zincblende structures ZnS and SiC, a structure-type comparison attempts to map Zn → Si and S → C, or Zn → C and S → Si, since the compounds are equicompositional. This mode exposes unique backbone structures and is practical for crystallographic prototyping. Identifying prototypes is also useful for modeling solid solutions and disordered materials 42,43 .

Decoration-type
The decoration-type (or permutation-type) mode determines unique atom decorations for a given crystal structure, i.e. inequivalent colorings of a structure, where each element is denoted by a different color (Fig. 3c). Continuing with the zincblende example, the A and B atomic sites are equivalent: swapping elements on the sites results in a duplicate compound compared to the original decoration. Thus, only one site decoration choice is necessary to create a distinct compound. Given a compound with n species, there are n! possible atom permutations. XtalFinder automatically (i) generates compounds with the different atom decorations for a crystal, (ii) compares the decorations (via a material-type comparison), and (iii) identifies the unique configurations. Atom decorations are only compared if atomic types have the same Wyckoff multiplicity and similar site symmetries (see subsection "Isopointal structures: compare symmetry").
Depending on the matching (misfit) tolerance and the choice of the reference decoration, calculated equivalency groups can violate divisor theory. For instance, two decorations can match with a certain misfit; however, a better match with a smaller misfit can exist with another decoration. To combat incorrect groupings, XtalFinder executes a consistency check, verifying the groupings are commensurate with the possible divisors. If they are not, XtalFinder searches for better matches and regroups the compatible decorations.
For example, ICSD entry BiITe #10500 (original geometry) has six possible atom decorations: ABC, BAC, CBA, ACB, CAB, and BCA. Since the three equicompositional sites are comprised of the same Wyckoff multiplicity and site symmetry (multiplicity 1 and site symmetry 3m. in space group #156), all structures are placed in the same initial comparison group, with ABC chosen as the reference decoration (since it is the first in the set). After comparing, the equivalent groups and their misfit values are: However, the number of equivalent decorations in each set are not the same, violating Lagrange's theorem 44 . Furthermore, all misfits values should be the same, since the underlying structure is unchanged. The incommensurate groupings are a symptom of only comparing to the reference decoration, as opposed to crosscomparing with other decorations.
To remedy incorrect groupings, XtalFinder checks for better matches (i.e. potential equivalent decorations with lower misfit values). Therefore, the "duplicate" decorations are compared to the other reference decorations and regrouped to minimize the misfit value. In this case, the subsequent cross-comparisons are performed: • BAC with CAB and BCA, • CBA with CAB and BCA, and

Magnetic-type
Magnetic-type comparisons map atoms of the same atomic species and similar magnetic moments, i.e. analyzes spin configurations (Fig. 3d). For instance, given two body-centered cubic chromium compounds with antiferromagnetic ordering, the routine attempts to map Cr ↑ → Cr ↑ and Cr ↓ → Cr ↓ . A magnetic moment tolerance threshold denotes equivalent spin sites; where the default tolerance is 0.1μ B . The analysis can be performed for both collinear and non-collinear systems. The magnetic-type comparison can be joined with a magnetic structure generator to create distinct spin configurations for high-throughput simulation.

Multiple comparisons
With the plethora of compounds generated by computational frameworkssuch as AFLOW 22,45 , NoMaD 46 , Materials Project 47 , High-Throughput Toolkit 48 , Materials Cloud/AiiDA 49 , and OQMD 50 automatically comparing structures is necessary for highthroughput classification of unique/duplicate compounds and structure-types. For this purpose, we developed an automatic comparison procedure for multiple crystals (Fig. 4). Compounds are first grouped into isopointal sets by analyzing and comparing the symmetries of the structures, aggregating them by stoichiometry, space groups, and Wyckoff sets (calculated via AFLOW-SYM 20 ). Next, compounds are further partitioned into near isoconfigurational sets by determining and comparing the local LFA geometries in each structure. Within each near isoconfigurational group, one representative structuregenerally the first in the setis compared to the other structures via geometric comparisons and the misfit values are stored. Once the comparisons finish, any unmatched structures (i.e. misfit values greater than ϵ match ) are reorganized into new comparison sets. The process repeats until all structures have been assembled into matching groups or all comparison pairs are exhausted. The three comparison analyses are performed in this order for two reasons: (i) to categorize structural similarity to varying degrees (isopointal, near isoconfigurational, and isoconfigurational) and (ii) to efficiently group compounds to reduce the computational cost of the geometric structure comparison (see "Speed and scaling considerations" in the Results). This procedure is the same for material-, structure-, decoration-, and magnetic-type comparisons; however, different atom mapping restrictions are applied depending on the comparison mode.

Multithreading
To enhance calculation speed, multithreading capabilities can be employed. The three computationally intensive procedures -calculating the symmetry, constructing the local LFA geometry, and performing geometric comparisonsare partitioned onto allocated threads, offering significant speed increases for large collections of structures.

Automatic comparisons
There are three built-in functions to compare multiple structures automatically: (i) compare structures provided by a user, (ii) compare an input structure to prototypes in AFLOW 17,18 , and (iii) compare an input structure to entries in the AFLOW.org repository. An overview of each high-throughput method is discussed below and usage is detailed in the Methods section.

Compare user datasets
Users can load crystal geometries and compare them automatically with XtalFinder. Options to perform both material-type and structure-type comparisons are available to identify unique/ duplicate compounds or prototypes, respectively. For structuretype comparisons, the unique atom decorations for each representative structure are determined. Once the analysis is complete, XtalFinder groups compatible structures together and returns the corresponding misfit values.

Compare to AFLOW prototypes libraries
Given an input structure, this routine returns similar AFLOW prototype(s) along with their misfit value(s) (Fig. 5a). AFLOW contains structural prototypes that can be rapidly decorated for high-throughput materials discovery: 590 in the Prototype Encyclopedia 17,18 and 1,492 in the High-throughput Quantum Computing library 25 . In this method, AFLOW prototypes are extractedbased on similar stoichiometry, space group, and Wyckoff positions to the inputand compared to the user's structure. Since only matches to the input are relevant, the procedure terminates before regrouping any unmatched prototypes. The attributes of matched prototypes are also returned, including the prototype label, mineral name, Strukturbericht designation, and links to the corresponding Prototype Encyclopedia webpage. The scheme identifies common structure-types with the AFLOW libraries orif no matches are foundreveals new prototypes. Absent prototypes can be characterized automatically in the AFLOW standard designation with XtalFinder's prototyping tool (discussed in subsection "Problem of the ideal prototype").
Compare to AFLOW.org repository Compounds are compared to entries in the AFLOW.org repository using the AFLOW REST-and AFLUX Search-APIs 51,52 (Fig. 5b). An AFLUX query (i.e. matchbook and directives) is generated internally and returns database compounds similar to the input structure based on species, stoichiometry, space group, and Wyckoff positions. With the AURL from the AFLUX response, structures for the entry are retrieved via the REST-API. The most relaxed structure is extracted by default; however, options are available to obtain structures at different ab-initio relaxation steps. The set of entries from the database are then compared to the input structure. Similar to the AFLOW prototype comparisons, candidate entries are only compared against the input structure, i.e. the procedure terminates without regrouping unmatched entries.
With the underlying AFLUX functionality, material properties can also be extracted, highlighting the structure-property relationship amongst similar materials. For instance, the enthalpy per atom (H atom ) for matching database entries are printed by including the enthalpy_atom API keyword in the query. Any number or combination of properties can be queried; available API keywords are located in refs. 51,52 . Table 1 shows the comparison results between a rocksalt NaCl compound and matching DFTrelaxed structures in the AFLOW.org repository along with their misfits and enthalpies per atom.
This routine reveals equivalent AFLOW.org compounds, if similar materials exists in the database. As such, it can estimate structural properties a priori; before performing any calculations. The estimation is based on the following assumptions: i. the matching AFLOW material resides at a local minimum in the energy landscape and ii. the input structure relaxes to the same geometry as that AFLOW compound, given comparable calculation parameters. The functionality can explore properties that are not calculated for a given entry, but are calculated for an equivalent entry. For example, compounds in AFLOW's prototype catalogs (LIB1, LIB2, LIB3, etc.) do not usually have band structure data; however, corresponding ICSD entries can be found which do provide band structure information. Finally, the method can identify compounds that are absent from the database and Fig. 4 Automatic grouping of multiple compounds. Compounds are compared in the following sequence: symmetry, local LFA geometry, and geometric structure. The algorithms determine isopointal, near isoconfigurational, and isoconfigurational structures, respectively, and aggregate them into similar sets (enclosed in black solid-lined boxes). Unmatched structures (i.e. ϵ > ϵ match ) after the initial geometric structure comparison are put into new groups and re-compared until all equivalent structures are grouped. This sequence is the same for material-, structure-, decoration-, and magnetic-type comparisons; however, the criteria for atom mappings differ (see subsection "Super-type comparisons" for details). The symmetry, local LFA geometry, and geometric structure comparisons (blue boxes) are multithreaded for parallel computation.
prioritize them for future calculation, enhancing the diversity of the AFLOW.org repositories.

Using AFLOW-XtalFinder
For ease-of-use, the XtalFinder routines are accessible via a command-line interface and a Python environment (see Methods for details).

Ideal prototype analysis in AFLOW.org
The ideal prototype designationsfor both the original and relaxed geometrieshave been successfully determined for all 4+ million entries in the AFLOW.org repository. The prototype label, parameter variables, and parameter values are incorporated into the AFLOW REST-and Search-APIs 51,52 . The corresponding API keywords for the original geometries are The prototype keywords enable researchers to search for materials by structure. This feature is useful for identifying possible crystal structures given experimental data. For example, with composition, space group, and occupied Wyckoff information (characteristics often known to experimentalists); users can construct the corresponding prototype label(s) and extract all compounds based on the provided structure-type. The keywords are also used to identify the frequency of certain prototypes in the AFLOW.org repository. For example, all compounds that are isopointal to the corundum prototype (labels: A2B3_hR10_167_c_e and A3B2_hR10_167_e_c) can be retrieved for both the original and relaxed geometries. Moreover, this search capability is used to discern if a structure-type is novel or has been reported previously.
The ideal prototype keywords also reveal whether a compound retains the same prototype designation before and after relaxation. For structures that retain the same prototype label, the parameter values show the continuous structure transition during relaxation. For structures that transform into different prototypes, the symmetry-based designations highlight the symmetries that were broken. This can indicate that certain element combinations/ arrangements are averse to certain prototype structures. More advanced relaxation techniques, e.g. symmetry-constrained relaxations 26 , would be required to restrict the relaxation to a given prototype structure.
Finding ϵ match : structural misfit versus enthalpy To identify a suitable threshold for matching similar structures (ϵ match ), Fig. 6 plots the misfit value (ϵ) between two mapped structures and their difference in enthalpy per atom (ΔH atom ). The structures in the test set are comprised of DFT-relaxed entries from the entire AFLOW-ICSD catalog as of 14 August 2020 (60,390) 53,54 . Compounds are grouped via commensurate atomic elements, stoichiometries, symmetries, and local LFA geometries. Furthermore, only compounds calculated with similar ab initio settings are compared togethersuch as LDAU parameters, kpoint per reciprocal atom (KPPRA), and pseudopotentials (see Supplementary Note 1 for details)to prevent extraneous enthalpy differences due to differing parameters. In addition, magnetic systems are excluded since the magnetic moment is not incorporated into the misfit value. For these comparisons, the unit cell volumes are not rescaled, and the best lattice/origin choices are explored (minimizing the misfit value) to show better correlation with the enthalpies. After grouping the structures and identifying one-to-one mappings, misfit values for the remaining 6,795 comparison pairs are calculated. Figure 6a and b show the enthalpy difference ranges 0 − 100 meV/atom and 0 − 10 meV/atom, respectively, highlighting the maximum enthalpy differences at different misfit values.
In general, the misfit value correlates with the enthalpy difference for ϵ≤0.1: as the misfit value decreases, the enthalpy difference also reduces. For ϵ > 0.1, the enthalpy spread widens with the misfit. Some comparison-pairs exhibit large misfit values, but have similar enthalpies. This follows intuition since it is possible for significantly differing structures to have similar properties. The sparsity of the data points for large values of ϵ is attributed to the lack of one-to-one mappings as structures become increasingly dissimilar. This suggests XtalFinder and the misfit criteria are better suited to quantifying similar structures, rather than relating disparate structures. Figure 6 reveals possible thresholds for ϵ match based on the maximum enthalpy difference allowed for mapped structures. For ϵ≤0.1, the enthalpy differences per atom are all below 5 meV/atom, with the exception of one comparison-pair. Reducing the misfit cutoff reasonably guarantees the enthalpy differences will also decrease; e.g. enthalpies will be within 1 meV/atom and 2 meV/atom for misfit values below 3.58 × 10 −3 and 0.025, respectively. The maximum enthalpy difference jumps significantly (to approximately 50 meV/atom) near ϵ = 0.125. Thus, matching structures with misfits beyond ϵ = 0.1 are not guaranteed to exhibit similar enthalpies. This value is in agreement with Burzlaff and Malinovsky's proposed threshold. By default, XtalFinder employs a threshold of ϵ match = 0.1 to ensure similar materials match within approximately 5 meV/atom. The threshold is also used for comparing prototypes; two prototypes that match within ϵ match = 0.1, are expected to have enthalpies within 5 meV/atom when decorated with the same atomic elements. Users can adjust to stricter (or looser) thresholds for matching; however, ϵ match ≫ 0.1 is not guaranteed to yield small enthalpy differences between matched structures.

Functionality differences with other codes
In addition to XtalFinder, other structure comparison tools are available to the materials science community: Structure Matcher 11 , XTALCOMP 10 , SPAP 12 , CMPZ 13 , CRYCOM 9 , STRUCTURE-TIDY (via Platon) 14 , and COMPSTRU 15 . A summary of the offered functionalities related to automatic comparisons and structure prototyping is indicated in Table 2 and described below.

Source code availability
The source codes are available for the following packages: XtalFinder (via AFLOW), Structure Matcher (via Pymatgen), and XTALCOMP (via XTALOPT). Pre-compiled binaries for SPAP on different operating systems are available with the CALYPSO software 55,56 . Source codes for the other software: CMPZ (implemented in KPLOT 57 ), CRYCOM, STRUCTURE-TIDY, and COMPSTRU (online) are not available. Therefore, the latter packages are not convenient for merging into user-workflows.

Input file formats
The different structure file formats for each comparison code are listed below; XtalFinder: VASP (POSCAR) 27 , FHI-AIMS 28

Symmetry analysis
XtalFinder is the only package coupled with an internal symmetry calculator (AFLOW-SYM 20 ). CRYCOM, STRUCTURE-TIDY, and COMP-STRU require a symmetry input (space group number) to perform the comparison, but lack methods to calculate the symmetry internally. CMPZ allows a symmetry input; however it is not required to perform the comparison. Structure Matcher, XTAL-COMP, and SPAP do not consider symmetry in their structural analyses.

Multiple comparisons
The only packages that offer comparison of multiple materials in a single command are XtalFinder and Structure Matcher. Other software, such as XTALCOMP and SPAP, showcase comparison results performed on multiple structures, but multi-comparison routines are not available to users. To achieve similar functionality, users need to implement external regrouping procedures.

Decoration-type comparisons
XtalFinder is the only code that automatically determines the unique (and equivalent) atom decorations for a given crystal structure. With other packages, users must externally generate, organize, and compare the subsequent decorations. Beyond the lack of routines to generate decorations, the codes find incorrect equivalent decoration groups. In the BiITe (ICSD #10500) example discussed in the "Decoration-type" comparison mode section, Structure Matcher's group_structures function (with ltol=0.2, stol=0.17, angle_tol=5.0) identifies groupings that violate divisor theory:   . ltol=0.2, stol=0.3, angle_tol=5). XtalFinderwith its symmetry analysis coupled with mapping routinescorrectly distinguishes these decorations and is the only viable option to establish consistent unique/duplicate decorations for crystalline prototypes.

Database comparisons
XtalFinder is the only module that features a method for comparing input structures to a database of materials, namely AFLOW.org. The API functionality coupled with XtalFinder ensures comparisons are performed with the most current version of the database, incorporating new materials as they are calculated.  Supplementary Fig. 1. Candidate misfit thresholds are chosen based on the acceptable maximum enthalpy deviation between match structures. For example, misfit values below ϵ = 0.025 and ϵ = 0.1 (black vertical lines) are expected to yield enthalpy differences no larger than ΔH atom ≈ 2 meV/atom and ΔH atom ≈ 5 meV/atom, respectively (black horizonal lines). As the misfit values increase beyond ϵ > 0.1, the spread of the data points also increases. A large jump in the maximum enthalpy difference occurs at approximately ϵ = 0.125, indicating matched structures near this value and beyond are not guaranteed to have similar enthalpies.
Furthermore, XtalFinder users can compare structures at various relaxation steps (with the --relaxation_step option). Users of the other packages need to extract relevant structures (e.g. similar compositions, space group, Wyckoff positions, and local atomic geometries at particular relaxation steps) by-hand or code auxiliary scripts to perform similar functionality.

Prototype comparisons
XtalFinder compares to all prototype structures in AFLOW: the Prototype Encyclopedia, the High-Throughput Quantum Computing library, and initial geometries in the AFLOW.org repository. Similar to the database comparisons, XtalFinder automatically includes new prototype structures as they are added to AFLOW. Structure Matcher only compares structures against a static subset of AFLOW prototypes (i.e. the Prototype Encyclopedia via AflowPrototypeMatcher 60 ). Moreover, XtalFinder provides the internal degrees of freedom for any structure (via the prototyping routines); functionality all existing codes currently lack. To compare to these prototype representations, users of other packages need to convert the degrees of freedomincluding expansion of the corresponding Wyckoff positionsinto a structure file a priori. While all benchmarks were performed serially, XtalFinder routines are parallelized and users can specify the number of threads for the analyses (--np=x), offering additional speed over other packages for large-scale automatic comparisons requiring little or no user input. Therefore, XtalFinder will be more performant, especially when comparing more structures.

Comparison accuracy
As shown in Fig. 6, the XtalFinder misfit value decreases with the enthalpy difference between matched compounds, validating its accuracy. Comparisons with Structure Matcher are less accurateand at times qualitatively incorrectdue to conversions of structures to an "average lattice" 11 , matching significantly differing lattices with no penalty on the rms value. For example, Se (ICSD #104187, space group #229) and Se (ICSD #57181, space group #166) are classified as distinct by XtalFinder because the lattices are considerably dissimilar (ϵ latt = 0.15), consistent with the space groups. Despite having different symmetries, Structure Matcher inaccurately finds rms = 0 between the structures. This distorts their rms value, and it cannot be used to correlate properties of matched compounds, e.g. enthalpy. Conversely, XTALCOMP is qualitatively accurate, but it lacks a quantitative similarity metric (the return type is a Boolean). Furthermore, XTALCOMP comparisons neglect volume scaling between structures, an essential feature for identifying prototypes. XtalFinder is the only Table 2. Functionalities of comparison codes specific to high-throughput analysis and structure prototyping. This tabulation is not exhaustive; many programs offer additional analyses, such as fragment/molecular comparisons, and are outside the scope of this work. 1 : optional symmetry input. 2 : requires symmetry input. 3 : Structure Matcher matches magnetic structures with opposite spins (SpinComparator function). 4 : Structure Matcher compares to the AFLOW Prototype Encyclopedia partially, as it does not provide the internal degrees of freedom for the prototype. Quantitative similarity metric

AFLOW-XtalFinder Structure Matcher XTALCOMP SPAP CMPZ CRYCOM STRUCTURE-TIDY COMPSTRU
comparison software suitable for quantitatively measuring similarity of materials and prototypes. Overall, XtalFinder is optimized for prototype detection and structural comparison within large datasets. In addition, it is designed to be accessible to the broader materials science community for integration into user workflows.
Unique prototypes in the AFLOW-ICSD catalog With XtalFinder, unique compounds and prototypes have been identified in the ICSD catalog of the AFLOW.org repository. Table 3 shows the statistics for the original (reported by the ICSD 53 ) and DFT-relaxed geometries (via the AFLOW standard 32 ) for 60,390 entries. Material-type comparisons and suppressing volume scaling reveal the number of unique compounds. Subsequent structure-type comparisons (allows for volume scaling) determine the number of distinct prototypes. The representative compound for each prototype is chosen as the entry with the lowest ICSD number, since it is generally the oldest among the compounds (and less likely to be removed from the ICSD). The unique atom decorations for each prototype are determined via the decorationtype comparison. Moreover, the prototypes are cast into the AFLOW prototype designation form, exposing its degrees of freedom. Finally, the prototypes are compared to the Prototype Encyclopedia 17,18 to distinguish between existing and new structures. For the subsequent comparisons, the matching threshold is chosen as ϵ match = 0.1 to group similar compounds (and prototypes when decorated with alike atoms) that are expected to have enthalpies differing by approximately 5 meV/ atom or less (see Fig. 6).
The analysis shows that the original geometry set includes 34,820 unique compounds (57.7% of the total number of entries) and 15,205 prototypes (25.2%). Similarly, the DFT-relaxed set contains 33,544 unique compounds (55.5%) and 14,692 prototypes (24.3%). Based on the symmetry comparisons, there are 8,521 (14.1%) original and 8,493 (14.1%) relaxed distinct isopointal structure-types. In general, the original geometry set has more distinct compounds and prototypes than the DFT-relaxed set. This is attributed to the different volumes (e.g. measured temperatures and pressures) of the original geometries, while the DFT-relaxed geometries represent the ground state configurations, yielding additional degenerate compounds.
While AFLOW.org contains a subset of the ICSD catalog, the highest frequency prototypes are consistent with those published for the ICSD 64 . In particular, XtalFinder and the ICSD both identify the following structures as some of the most common prototypes: Al 2 MgO 4 (spinel, A2BC4_cF56_227_d_a_e-001), CaTiO 3 (cubic perovskite, AB3C_cP5_221_a_c_b), GdFeO 3 (AB3C_oP20_62_a_cd_c-001), and NaCl (rocksalt, AB_cF8_225_a_b) (see Table 5 and Supplementary Tables 1-14). The criteria for grouping compounds into structure-types described in ref. 64 is more relaxed than XtalFinder (e.g. larger tolerances for c/a and β ranges and userdefined ranges for fractional atomic coordinates). Consequently, XtalFinder finds more distinct prototype structures than the 1,600 (as of January 2007) in ref. 64 .
From this analysis, new candidate prototypes have been identified that are missing from the Prototype Encyclopedia (signified by empty rows in the last columns of Table 5 and Supplementary Tables 1-14). The number of new prototypes in the original (relaxed) sets with more than 10 unique compounds exhibiting the structure are: binaries 31 (33), ternaries 168 (177), quaternaries 40 (42), and quinaries 4 (3); while the unaries, senaries, and septenaries have 0 (0). This amounts to 243 distinct crystalline structures that will be incorporated into future installments of the Prototype Encyclopedia. Most entries in the Prototype Encyclopedia stem from experimentally observed structures; therefore, we plan to use the original geometries for prototyping.
Some structures in Table 5 and Supplementary Tables 1-14 are equivalent to the Prototype Encyclopedia prototypes with a  Table 4. The prototypes and their symmetries. The prototypes are grouped into the 14 Bravais lattices: triclinic (tri), monoclinic, (mcl), base-centered monoclinic (mclc), orthorhombic (orc), base-centered orthorhombic (orcc), face-centered orthorhombic (orcf), bodycentered orthorhombic (orci), tetragonal (tet), body-centered tetragonal (bct), hexagonal (hex), rhombohedral (rhl), simple cubic (cub), face-centered cubic (fcc), and body-centered cubic (bcc).   5. Most frequent prototypes in the AFLOW-ICSD catalog. The five most common prototypes are shown for unary, binary, ternary, and quaternary compounds as identified via XtalFinder . The original and relaxed geometry sets are shown on the top and bottom portions of the table, respectively. Each prototype is listed with the following information: AFLOW label, number of unique atom decorations, representative compound with its ICSD designation, number of unique compounds exhibiting the structure (along with the count when including duplicate compounds), and matches to existing AFLOW prototypes, if they exist. Empty rows in the AFLOW prototype column reveal new prototypes, which will be included in Part 3 of the AFLOW Prototype Encyclopedia. The complete list of prototypes is provided in Supplementary Tables 1 different number of atom types. For example, the third most common ternary ABC_hP9_189_f_bc_g (RuSiZr ICSD #16306, original geometry) matches to the binary analog A2B_hP9_189_fg_bc (Fe 2 P, Strukturbericht: C22) 17,18 when the f and g Wyckoff positions are of the same atom type. We classify the prototypes as distinct; similar to distinguishing between the diamond (n = 1) and zincblende (n = 2) structures.

DISCUSSION
Herein, we present XtalFinder: a software for automatically identifying unique prototypes and calculating structural similarity of crystals. The framework performs robust symmetry, local atomic geometry, and geometric structure comparisons. Routines are available to quantify structural similarity for i. compounds (material-type comparisons), ii. prototypes (structure-type), iii. atom decorations (decoration-type), and iv. spin configurations (magnetic-type). The program can analyze multiple structures simultaneously and aggregate them into equivalent groups, with multithreading capabilities available for improving performance. Built-in methods compare input structures to the AFLOW.org repository and the AFLOW prototype libraries for detecting new compounds and structure-types. Crystal prototyping techniques are also introduced to cast structures into a standard designation, facilitating extensions of the Prototype Encyclopedia. A command line and Python interface are provided for easing incorporation into user-workflows. Applying the procedures to the AFLOW-ICSD repository revealed approximately 15,000 prototypes out of over 60,000 ICSD entries, representing over 34,000 unique compounds. Subsequent comparisons with the AFLOW prototype libraries exposed new candidate entries for future iterations of the encyclopedia. Overall, XtalFinder serves as a versatile tool for finding prototypes and comparing crystalline geometries.

Command-line interface
The XtalFinder command-line calls are detailed below. Function descriptions and options are provided following each command.
• aflow --prototype < file -Converts a structure (file) into its standard AFLOW prototype label. The parameter variables (degrees of freedom) and corresponding values are also listed. Information about the label and parameters are described in the refs. 17,18 .
Options specific to this command: --setting=1|2|aflow ◇ Specify the space group setting for the conventional cell/ Wyckoff positions. The aflow setting follows the choices of the Prototype Encyclopedia: axis-b for monoclinic space groups, rhombohedral setting for rhombohedral space groups, and origin centered on the inversion site for centrosymmetric space groups (default: aflow). --equations_only ◇ Only print the symbolic version of the geometry file (in terms of the variable degrees of freedom).
• aflow --isopointal_prototypes < file -Returns prototype labels that are isopointal (i.e. similar space group and Wyckoff positions) to the input structure (file).
• aflow --unique_atom_decorations < file -Determines the unique and duplicate atom decorations for a given structure.
• --optimize_match -Explore all lattice and origin choices to find the best matching representation, i.e. minimizes misfit value.
• --no_scale_volume -Suppresses volume rescaling during structure matching; identifies differences due to volume expansion or compression of a structure.
• --ignore_Wyckoff -Neglects Wyckoff symmetry (site symmetry) for filtering comparisons, but considers the space group number.
• --minkowski -Performs a Minkowski lattice transformation 8 on all structures prior to comparison; offering a speed increase.
• --niggli -Performs a Niggli lattice transformation 7 on all structures prior to comparison; offering a speed increase.
• --primitive|--primitivize -Converts all structures to a primitive form prior to comparison; offering a speed increase.
-Specifies the magnetic moment for each structure (collinear or non-collinear) delimited by colons, signaling a magnetic-type comparison. The option does not apply to --compare_structures since the atom type is neglected. XtalFinder supports three input formats for the magnetic moment: (i) explicit declaration via comma-separated string m 1 , m 2 , . . . m n (m 1,x , m 1,y , m 1,z , m 2,x , . . . m n,z for non-collinear) (ii) read from a VASP INCAR, or (iii) read from a VASP OUTCAR. Additional magnetic moment readers for other ab initio codes will be available in future versions.
• --add_aflow_prototype_designation -Casts representative structure into the AFLOW standard designation. The option does not apply to command --unique_atom_decorations.
• --remove_duplicate_compounds -For structure-type comparisons, duplicate compounds are identified first (via a material-type comparison without volume scaling), then remaining unique compounds are compared, removing duplicate bias.
• --print_mapping -For comparing two structures, additional comparison information is printed, including atom mappings, distances between matched atoms, and the transformed structures in the closest matching representation.
• --print=text|json -For comparing multiple structures, the results are printed to human-readable text or JSON files, respectively. By default, XtalFinder writes the output to both files.
• --quiet -Suppresses the log information for the comparisons.
• --screen_only -Prints the comparison results to the screen and does not write to any files.
Python environment. In addition to the command-line interface, a Python module is available for inclusion into a variety of workflows. The module mirrors the format used for AFLOW-SYM 20 and AFLOW-CHULL 65 . An XtalFinder function is performed on the input(s) and the results are returned to an XtalFinder class. The module wraps around a local instance of AFLOW, and the path to the AFLOW executable can be specified by: XtalFinder(aflow_executable='your_executable').
By default, the XtalFinder object searches for an AFLOW executable in the PATH. An example Python script is shown below, where an XtalFinder object is initialized and a material-type comparison between two structure files (POSCARs) is performed.
• filename -A string specifying the path to a file containing structure files separated by a delimiter, e.g. filename='/home/user/lis-t_of_structures.txt'.

Python module.
A Python module for XtalFinder is available in Supplementary Method 1. All output is converted into JavaScript Object Notation (JSON) to ease integration into user workflows.
AFLOW-XtalFinder JSON output details. The output keywords for the XtalFinder functions are listed below as they appear in the JSON format. The output for multiple comparisons (user-defined sets, comparison to AFLOW prototypes, and comparison to AFLOW.org entries), unique atom decorations, and casting into the AFLOW prototype representation are described.
• aflow_prototype_label -Description: AFLOW label for the structure. -Description: information for the duplicate structures that match with the representative structure, i.e. misfit is less than ϵ match . -Type: array of structure_matched objects • structures_family -Description: information for the structures that are within the same family as the representative structure, i.e. misfit is between ϵ match and ϵ family . -Type: array of structure_matched objects • matching_aflow_prototypes -Description: labels of AFLOW crystal prototypes 17,18 that match with this structure (included when using option "--add_match-ing_aflow_prototypes"). -Type: array of strings A structure_representative object contains the following:

DATA AVAILABILITY
All crystallographic structure data is freely available and accessible online through AFLOW.org or programmatically via the REST-and AFLUX Search-APIs. The AFLOW prototype information is provided online at http://aflow.org/prototype-encyclopedia, and the corresponding structures can be generated with the AFLOW source code.

CODE AVAILABILITY
The XtalFinder module is integrated into the AFLOW software (version 3.2 and later). The source code for AFLOW is available at http://aflow.org/install-aflow/ and http:// materials.duke.edu/AFLOW/, and it is compatible with most Linux, macOS, and Microsoft operating systems. The multithreaded capabilities require GNU g++-4.4 or later. Questions and bug reports should be emailed to aflow@groups.io with a subject line containing "XtalFinder".