Database of open-framework aluminophosphate structures

Open-framework aluminophosphates are an important class of inorganic crystalline compounds because of their rich structural chemistry and diverse properties. We have collected 312 open-framework aluminophosphate crystal structures from published literature and established a database for these structures. For each aluminophosphate structure, we have assigned a unique index code and extracted its key chemical and crystallographic information from the original literature and the associated CIF file, such as the name, chemical formula, extra-framework species, Al/P ratio, space group, and unit cell parameters of the compound. More importantly, we have calculated the topological features for each aluminophosphate framework, including local connectivity, framework dimension, coordination sequences, vertex symbols, topology density, and the largest ring. To help experimental chemists identify their products, we have also calculated theoretical XRD peaks for all aluminophosphate structures. This database will provide important insight into understanding the structural chemistry of open-framework aluminophosphate compounds.

(2020) 7:107 | https://doi.org/10.1038/s41597-020-0452-4 www.nature.com/scientificdata www.nature.com/scientificdata/ of AlO 4 and PO 4 , our database contains a large number of anionic frameworks with AlO 5 , AlO 6 , or interrupted structures. To describe these structures in details, we provide many types of structure attributes in this database, including chemical attributes, crystallographic attributes, and topological attributes. Chemical attributes include the index code we assign for each structure, the name and chemical formula of the compound, extra-framework species, and the Al/P ratio of the framework; crystallographic attributes include space group, unit cell parameters, atomic coordinates (the atom labelling is consistent with that reported in the original reference), the residual factor for XRD structure refinement (R 1 for single-crystal data and R wp for powder data) 13 and the simulated XRD peak positions; topological attributes include the framework type code, polyhedral connectivity, framework dimension, coordination sequences, vertex symbols, topology density, and the largest ring.
To the best of our knowledge, this is the only structure database specialized for open-framework aluminophosphates, and most of the structure information in this database cannot be found in any other crystal structure database or even in their original literature. Using this database, theoretical chemists may easily find out the structure variation among different open-framework aluminophosphates and obtain insight in the structural chemistry of these compounds; experimental chemists may review all aluminophosphates that have been discovered and decide their synthetic targets for specific applications; crystallographers may search this database to identify their samples for structures with similar cell parameters. More importantly, this database provides various types of important structure data, which may serve as the input for future machine learning studies to reveal the complicated relationship among the structures, syntheses, and properties of open-framework aluminophosphates.

Methods
Data collection. All crystal structures were derived from the CIF files or the atomic coordinates reported in the original literature. Only structures that were unambiguously determined using X-ray diffraction techniques were collected in our database. Before adding a new structure into our database, we checked the difference between the new structure and those already included in our database. Only structures significantly different from existing ones were added. In the end, we obtained a total of 312 open-framework aluminophosphate structures. For each structure, we assigned a unique code in the format of "d.a.p.spg.sn", where d is the dimension of the framework, a and p represent the Al/P ratio in coprime integers, spg is the sequence number of its space group defined in the Internal Tables For Crystallography 14 , and sn is a serial number that discriminates different structures in our database.

Structure attribute calculation.
Most of the chemical and crystallographic attributes were extracted from the source literature and the CIF files associated. XRD simulations were performed using Materials Studio 15 . Topological attributes, such as coordination sequences, vertex symbols, the largest ring and TD10 calculations, were calculated using ToposPro 16 .
Simulated XRD peaks. To help phase identification, the powder XRD peaks were simulated for each structure according to its crystal structure. XRD peaks were listed in the descending order of d-spacing. In addition, the peak positions of the strongest three reflections were listed explicitly in 2θ degrees assuming Cu Kα1 radiation.
Coordination sequences consist of sequences of integers, reflecting the numbers of neighbouring atoms in different coordination shells of specific central atoms. Coordination sequences can be used to distinguish different framework topologies.
Vertex symbols indicate the sizes of the smallest rings associated with the coordination angles of the central atom. For a central atom connected with n neighbours, the vertex symbols consist of n × (n − 1)/2 integers. If no ring was found for an angle, the vertex symbol would be indicated by an asterisk.
Largest ring. We calculated the number of polyhedral central atoms involved the largest ring. TD10 is defined as the average number of neighbouring atoms in the first ten coordination shells of specific central atoms in a framework structure, which reflects the topological density of the framework. www.nature.com/scientificdata www.nature.com/scientificdata/

Data Records
Currently, our database consists of 312 aluminophosphate structures, which can be accessed at figshare 17 . Among these structures, 201 are pure aluminophosphates, the frameworks of which are constructed exclusively by Al, P, and O; 72 structures contain transition metals in their frameworks, 27 structures contain fluorine, and 15 structures contain silicon. Figure 1a shows the occurrence of the top-five framework compositions in our database.
The Al/P ratio is an important structural feature for open-framework aluminophosphates, which determines the electronegativity of the framework. In our database, there are 159 structures with Al/P ratios of 1.0, most www.nature.com/scientificdata www.nature.com/scientificdata/ of which correspond to neutral frameworks. 146 structures exhibit Al/P ratios less than 1.0, corresponding to anionic frameworks. Among these anionic structures, Al/P ratios of 1/2, 2/3, 3/4 and 4/5 are the most frequently observed (Fig. 1b). Moreover, there are 7 structures exhibiting Al/P ratios more than 1.0, containing rarely observed cationic frameworks.
From the symmetry's point of view, the aluminophosphate structures in our database belong to 70 space groups, covering more than 1/3 of all possible space groups. Low-symmetry space groups are much more common than high-symmetry ones for open-framework aluminophosphates. In particular, structures in space groups P2 1 /c, P1, C2/c, Pbca, and R3 amount to over half of our database. These space groups are also among the most frequently observed ones in other crystal structure database 18 (Fig. 1c).
Although all the P atoms in open-framework aluminophosphates are 4-coordinated, the Al atoms can be 4, 5, or 6-coordinated. More importantly, the connectivity of Al and P is even more diverse than their coordination states, which leads to the rich structure diversity of open-framework aluminophosphates (Fig. 2). For instance, 4-coordinated P atoms can be 1-, 2-, 3-, and 4-connected to neighbouring Al atoms via bridging O atoms; 4-coordinated Al atoms can be 2-, 3-and 4-connected to neighbouring P atoms via bridging O atoms; 5-coordinated Al atoms can be 4-and 5-connected to neighbouring P or Al atoms via O or F bridges; 6-coordinated Al atoms can be 3-, 4-, 5-, and 6-connected to neighbouring P or Al atoms via O or F bridges. On the other hand, an O or F atom may be the bridge connecting one Al and one P atoms (μ-O/F), connecting two Al and one P atoms (μ3-O), or connecting three Al atoms (μ3-O); it may also connect to only one Al or P atom (O/F). Considering the difference between O and F bridges and the existence of heteroatoms other than Al and P, the number of connectivity types for polyhedra in open-framework aluminophosphates has reached 40. Figure 2 shows all of the 40 types of polyhedral connectivity. Currently, UiO-26-as (Index code: 3.1.1.14.008) exhibits the most diverse polyhedral connectivity in our database, which contains 4-, 5-, 6-coordinated Al and five   4 20 . In comparison, all zeolitic aluminophosphates contain Al(μ-O) 4 and P(μ-O) 4 exclusively, representing the simplest polyhedral connectivity in our database.
Coordination sequences and TD10 are both the numbers of neighbouring atoms in the first few coordination shells from specific central atoms, reflecting the topological density of a framework structure. Coordination sequences are calculated for each crystallographically distinct atoms, whereas TD10 is the sum of all shells averaged among all distinct atoms. So TD10 is more "isotropic" than coordination sequences. Besides being used to distinguish different framework topologies, we can deduce important structure information from these topological attributes. For instance, we can deduce the dimension of an aluminophosphate framework according to its coordination sequences and TD10. Low-dimensional frameworks contain much more interrupted P or Al sites than high-dimensional frameworks, which will stop the rapid growth of the number of neighbouring atoms and break the connectivity of the framework structure. Figure 3a shows the distribution of TD10 in structures of different framework dimensions. For three-dimensional aluminophosphate frameworks, TD10 ranges from 357 to 2096, and the median TD10 is 818 in our database. For two-dimensional frameworks, TD10 varies in a much narrower range between 99 to 388, and the median TD10 for two-dimensional frameworks is 194, much lower than that of three-dimensional frameworks. TD10 for one-dimensional frameworks varies in an even narrower and lower range than that of two-dimensional frameworks (Fig. 3a). The ranges of TD10 among different framework dimensions almost do not overlap, so we can use TD10 to estimate the framework dimension. Figure 3b shows the plot of TD10 versus the framework density for all structures in our database. The correlation coefficient R 2 is 0.72393, indicating a significant correlation between TD10 and framework density. Therefore, TD10 can not only reflect the topological density, but also the general trend of framework density.

technical Validation
Most structure attributes in our database were calculated by Materials Studio and ToposPro, which are computer programs widely used by researchers from all over the world. We have also compared our data with those obtained from other sources, such as the online structure database of the International Zeolite Association 4 , to verify the accuracy of our data. Selected comparison results are listed in Tables 1 and 2.