OPTIMADE, an API for exchanging materials data

The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification.


Introduction
Data has become a crucial resource in many scientific fields, and materials science is no exception.Experimental data has long been meticulously curated in handbooks and databases, with the first edition of Landolt-Börnstein 1 being published in 1883.Nowadays, various commercial and non-commercial experimental databases, such as the Inorganic Crystal Structure Database (ICSD) 2 , are widely used throughout the field.
High-throughput electronic structure calculations, themselves enabled by algorithmic improvements and growing computational resources, have significantly increased the availability of useful data from computational simulations of materials.Since the pioneering work of Ceder et al. 3 , a large number of high-throughput first-principles studies have been reported in the literature (for a review, see Ref. 4), with results typically collated in databases.This explosion in the amount of available data has kick-started a new paradigm of data-driven materials science 5 , creating opportunities for concurrent, automated materials design, boosted by databases that can be queried by humans and machines via an application programming interface (API) [6][7][8][9] .
As materials databases differ in fidelity and focus across material classes and properties, it is extremely beneficial to be able to liberate and unify data from multiple sources.However, retrieving data from multiple databases is difficult as each database has its own specialized, and sometimes esoteric, API that governs data access patterns, querying and the representation of the underlying data.Moreover, as the APIs of individual databases inevitably evolve, existing clients must also evolve; a significant maintenance effort is required to translate the responses from the new API to the representation of the client.
Motivated by these considerations, providers of several materials databases united to design and implement an API specification that enables seamless access and interoperability across materials databases.The effort started at the workshop "Open Databases Integration for Materials Design", held at the Lorentz Center in Leiden, Netherlands in October 2016, and continued at followup workshops held at CECAM in Lausanne, Switzerland in June 2018, June 2019, and June 2020.The result is the OPTIMADE specification (v1.0) 10 ; OPTIMADE defines a RESTful API that is queried with URLs, with responses adhering to the JSON:API specification 11 .Specification development adheres to Semantic Versioning 12 to avoid surprises and enable backwards-compatibility where possible, without impeding further development.By extracting the technical and scientific commonalities from existing APIs, the OPTIMADE API has been designed so that it can be implemented across a broad range of materials domains, database back-ends and sizes.
In this paper, we first review the query format of existing databases to motivate the design and construction of the OPTIMADE API specification.We then illustrate the use of the API with a set of worked examples; databases that already fully support the OPTIMADE API are enumerated alongside their results for representative queries in Table 1.We further highlight libraries that could accelerate uptake and assist materials data curators to support the OPTIMADE API format.Finally, we discuss future prospects and ongoing development of the OPTIMADE API.

Current generation of materials database APIs
Materials databases are a veritable treasure trove of information, but they only become useful once a human, or machine, can access them.In this section we review the current range of APIs used by various databases to enable access to an example compound, SiO 2 , which serves to highlight the variation of APIs that a user must navigate in order to make use of multiple materials databases.We then demonstrate the universal nature of the OPTIMADE API that permits seamless access to all materials databases that support it.
We first compare and contrast the APIs that must be used to request records on an exemplar system, SiO 2 , from three different databases: AFLOW, the Materials Project, and the Crystallographic Open Database (COD).All three queried databases support requests using a representational state transfer through a web service (RESTful), at the following URLs: Note that the Materials Project requires the user to supply an API key (http://materialsproject.org/open) preferably specified in the X-API-KEY HTTP header.
The three APIs vary syntactically (in format), taxonomically (having different names for terms), and semantically (in the conflicting definitions of chemical formula as an intensive or extensive property).AFLOW returns all structures with both Si and O present, whereas both the Materials Project and COD deliver any structure with a formula unit of SiO 2 .The wide range of query formats that will deliver non-overlapping structures significantly complicates access to all available data for SiO 2 , without even considering the differing representations of the structures returned.
The inconsistent format of the query is further complicated by the difficulty of accessing other structures with the SiO 2 formula.Focusing on just AFLOW, two possible queries that users more familiar with the other APIs might attempt are http://aflow.org/API/aflux/?compound(SiO2) which returns no response; http://aflow.org/API/aflux/?compound(O2Si1) now lists the elements in alphabetical order as required by AFLOW, and includes the "1" after element symbols, so that "SiO2" becomes "O2Si1".This returns entries where the unit cell is SiO 2 , but does not return Si 2 O 4 or simulation cells containing more formula units.
The exemplar http://aflow.org/API/aflux/?species(Si,O),nspecies(2) returns all entries with at least one Si and an O, so while the response includes the SiO 2 phases of interest, it may also contain other stoichiometries.
The distinctions between the request format for each database require the user to become an expert in many different APIs.This again emphasises the need for a single well-designed and standardized API to access all materials databases, which is the aim of the OPTIMADE API.

The OPTIMADE API
The OPTIMADE API provides a holistic standard for serving and accessing the information in compatible materials databases.To retrieve information about materials from a particular database, the user submits a request via a URL.Each database provider will have published a base URL that serves the OPTIMADE API, for example https://example.com/optimade/.The same URL path, across different OPTIMADE API implementations, allows uniform access to the underlying databases.Both human-readable and machine-readable versions of the OPTIMADE API specification are available online with releases archived at Zenodo 10 .The specification is also registered as an API standard on FAIRsharing.org 13.

Design philosophy
The OPTIMADE specification strives to enable materials information to be filtered and retrieved in a straightforward and intuitive manner.The three queries from the previous section can each be performed on a standardised, versioned endpoint (/v1/structures) that enables access to a structures entry resource type that consists of many well-defined attributes.The specification then defines a grammar for filtering entries against these attributes, allowing the previous SiO 2 filter example to 3/12 be expressed in a common way (?filter=chemical_formula_reduced="O2Si").Altogether, the universal OPTIMADE URL, where only the implementation URL changes, becomes: <optimade_implementation_url>/v1/structures?filter=chemical_formula_reduced="O2Si" The OPTIMADE specification also aims to be flexible to many different underlying data representations, and thus there are very few properties that are mandatory.Instead of enforcing an exhaustive set of property definitions, individual OPTIMADE implementations can describe the data they serve via /info endpoints for each entry type.These introspective endpoints allow clients to adapt to the particular implementation for an underlying database, and allow providers to disseminate properties beyond the simple structural and chemical information standardized by the specification.To avoid naming collisions, each provider-specific property name must be prefixed by a provider-specific token, itself bookended by underscores (_).The property custom_property from an example provider with assigned prefix exmpl would be expressed as _exmpl_custom_property: e.g., _tcod_a for the lattice constant, a, in TCOD, or _aflow_spacegroup_relax for the space group of the relaxed structure in AFLOW.

Implementation discovery
The list of implementations confirmed and tested in this paper to support the OPTIMADE API is shown in Table 1.They are all publicly accessible, providing users with open access to large international repositories of computational and experimental materials science data.
The OPTIMADE consortium provides an open, federated list of implementations (https://providers.optimade.org).It is considered to be a catalogue of currently available and/or known public OPTIMADE implementations.New implementations are welcome and can register themselves via a pull request on GitHub (https://github.com/Materials-Consortia/providers).
The requirements for appearing in the above providers list are very loose.Some databases listed in the catalogue are signalling the intent of future implementations, while others only have partial implementations of the OPTIMADE API, including JARVIS (https://jarvis.nist.gov/optimade) 14and MatCloud (https://matcloud.com.cn) 15 .Some software frameworks, such as AiiDA [16][17][18] , also enable users to access their personal data through an OPTIMADE API, and therefore have a dedicated provider-specific ID, but no single official OPTIMADE implementation base URL.
The OPTIMADE API also specifies an endpoint for semi-automated cross-provider discovery.The /links endpoint serves links resources that may refer to either provider internal (child, root) or external (external, providers) resources based on the link_type attribute.To avoid being overly restrictive, it is at the provider's discretion whether they serve a list of known providers; however, this provides a mechanism for scalable and decentralised discovery of new implementations beyond the federated provider list. 19,20 0,192 62,293 382,554 Crystallography Open Database (COD) 21,22 416,314 3,896 32,420 Theoretical Crystallography Open Database (TCOD) 23 2,631 296 660 Materials Cloud 9,16,17 886,518 801,382 103,075 Materials Project [24][25][26][27] 27,309 3,545 10,501 Novel Materials Discovery Laboratory (NOMAD) 28,29 3,359,594 532,123 1,611,302 Open Database of Xtals (odbx) 30 55 54 0 Open Materials Database (omdb) 31 58,718 690 7,428 Open Quantum Materials Database (OQMD) 32 153,113 11,011 70,252 Table 1.Materials databases with active OPTIMADE API implementations and the number of entries they return for the filters presented in this paper.The OPTIMADE website provides an up-to-date record of the implementation status of the databases (https://www.optimade.org/providers-dashboard/).AFLOW, Materials Project, odbx, omdb, and OQMD comprise computational materials data generated using database-specific workflows 20,24,[30][31][32] .For the purposes of this table, Materials Cloud results were aggregated across all provided sub-databases.COD and TCOD comprise experimental and theoretical crystal structure data extracted from the literature.Materials Cloud comprises materials data from computational workflows; sub-databases group data by research project and can be contributed by users 9 .NOMAD aggregates computational data from multiple sources including from several of the repositories listed here.

Worked example
To illustrate the effective use of the OPTIMADE API we now provide a worked example of querying structures.We explore materials containing Group 14 elements (the carbon family), starting with a general search before drilling down to specific materials.The Group 14 elements are of particular interest as their atomic orbitals regularly hybridise, enabling a variety of bonding with differing geometries.The hybridised orbitals enable these elements to form the backbone of a wide range of compounds, both inorganic and organic, that underpin plastics, drugs, and semiconductors.Group 14 therefore forms both a diverse and important family of compounds that heavily populates databases, so are an ideal case study to demonstrate the OPTIMADE API.

Common features of the response
Whilst our previous exploration of the Group 14 compounds considered only SiO 2 , the flexibility of the OPTIMADE API allows us to start with a search over all materials in Group 14, comprising carbon (C), silicon (Si), germanium (Ge), tin (Sn), and lead (Pb).We start with a simple API call that searches for all materials that contain at least one element in Group 14: /v1/structures?filter=elements HAS ANY "C", "Si", "Ge", "Sn", "Pb" This string can be appended to the base URL of any of the available implementations, to gather results in a standardised form.
The base URL can be found on the providers dashboard (https://www.optimade.org/providers-dashboard).
As an example, this query is run through the Theoretical Crystallography Open Database (TCOD) 23 with the following URL: https://www.crystallography.net/tcod/optimade/v1/structures?filter=elements HAS ANY "C", "Si", "Ge", "Sn", "Pb" The JSON response is summarized in Boxes 1 through 5, where some lines have been omitted for brevity; the full response is given in Supplementary File 1.
The first tranche of the JSON response comprises the "data" field that contains a list of entries returned for the query; a truncated version of this field is shown in Box 1, displaying a few salient properties of just one of the ten entries from the full response.The response for a particular material entry comprises multiple sections: attributes Box 1 shows the physical properties of the material comprising both mandatory information such as elements and lattice_vectors, as well as optional, additional database-specific information prefixed with the database name (e.g., _tcod_, here used to provide lattice parameters).This ensures that all databases return the most important and common information in a standardized format, as well as allowing them to include additional database-specific data.Importantly, the OPTIMADE specification provides a standardized way for database implementations to be self-documenting, via introspective /info endpoints.We see in the elements section that here we have returned a material comprising the element of interest, Sn, as well as O and Ta.

5/12
This query returns 296 materials from the TCOD database, with the response summarized in Box 6, where some lines have been omitted for brevity and the full response is given in the Supplementary File 2. The number of matching entries (N 2 ) across all implementations for this filter are shown in Table 1.We can now see that the first structure, and indeed all structures, returned are comprised of at least one element in Group 14, here Ge, and a maximum of one other element (a binary material), here O.Additional filters can be chained to further refine the materials returned, or to construct more complex queries.For example, ternary structures that contain at least one of the elements C, Si, Ge, or Sn, but do not contain Pb (e.g., for applications where Pb toxicity would be a concern), can be retrieved using the filter /v1/structures?filter=elementsHAS ANY "C", "Si", "Ge", "Sn" AND NOT elements HAS "Pb" AND elements LENGTH 3   The number of entries matching this filter are denoted as N 3 in Table 1.
These simple examples demonstrate how useful chemical queries are expressible with the OPTIMADE API, allowing users to refine their queries and to suit their specific application.Further functionality of the OPTIMADE API can be found in the specification 10 .

Related libraries
The wider usage of the OPTIMADE API is a key goal for the consortium; to this end, several open source libraries have been developed to help users of the OPTIMADE API (either implementation developers, or client end-users), of which three are introduced below.The first two libraries offer tools that aid the implementation of the API for materials database developers, with the first also containing tools to construct and validate queries, while the third library is intended for end users of OPTIMADE-compliant APIs.
optimade-python-tools optimade-python-tools is an open source Python package available on GitHub (https://github.com/Materials-Consortia/optimadepython-tools).The package contains a complete set of tools for implementing an OPTIMADE-compliant API, as well as several utilities that can be used by client code.The package is listed on the Python Package Index (PyPI) as optimade (https://pypi.org/project/optimade).Current (v0.14) functionality of the package includes: • pydantic (https://github.com/samuelcolvin/pydantic)data and validation models for all objects defined in the OPTIMADE specification that can be used in server or client code; • an extensible reference server implementation leveraging pydantic and FastAPI (https://github.com/tiangolo/fastapi).This reference server forms the basis of the OPTIMADE implementations for the Materials Project 24 , NOMAD 29 , Materials
OPTIMADE::Filter OPTIMADE::Filter is a Perl library for the syntactical analysis of the OPTIMADE filter language.Apart from the construction of abstract syntax trees, the library can translate simple filter strings to SQL queries.The Git repository with the source code is publicly available on GitHub (https://github.com/Materials-Consortia/OPTIMADE-Filter).
pymatgen optimade module pymatgen 26 (https://pymatgen.org) is a Python library for materials science.A user-friendly OptimadeRester client has been added to a new OPTIMADE module within pymatgen to provide a way to query OPTIMADE structure resources in a way familiar to existing users of pymatgen and the Materials Project API.The Git repository with the source code is publicly available on GitHub (https://github.com/materialsproject/pymatgen).

Summary
The latest OPTIMADE API specification v1.0 10 offers holistic access to many leading crystal structure databases, namely: AFLOW, COD, TCOD, Materials Cloud, Materials Project, NOMAD, odbx, Open Materials Database (omdb), and OQMD.Open client implementations are also available (https://optimade.science, https://materialscloud.org/optimadeclient) that enable aggregated searches over many databases as well as user-friendly graphical widgets that can create an OPTIMADE filter to empower the user with even easier access to data.OPTIMADE provides researchers easy access to over 10,000,000 results for different materials, providing benchmarking opportunities and offering a huge opportunity for high-throughput screening and machine learning studies.The ability of the OPTIMADE API to search databases, expose links between databases, and deliver standardized results makes it well-positioned to significantly enhance the impact and permeability of pre-existing data silos.This should empower researchers to scan through new and unexpected material families, and train models from all available data that can understand deep correlations.The OPTIMADE API is flexible and will be extended to more use cases going forward.The development and adoption of the OPTIMADE API relies on the involvement of a large number of scientists, so contributions from the community are strongly encouraged, and questions on development, registration of a provider, or usage can be directed to the web forum (https://matsci.org/optimade)or mailing list (dev@optimade.org).Proposed developments include the standardization of more filterable materials properties, the integration of molecular dynamics simulations and of experimental results, and extensions beyond electronic-structure calculations.The future development of APIs, including OPTIMADE, should herald an era of effective use of big, open data in materials science.

1 { 2 "Box 6 .
The truncated JSON response for the more focused search, showing the elements in the first material returned.