Recent advances and applications of deep learning methods in materials science

Deep learning (DL) is one of the fastest-growing topics in materials data science, with rapidly emerging applications spanning atomistic, image-based, spectral, and textual data modalities. DL allows analysis of unstructured data and automated identification of features. The recent development of large materials databases has fueled the application of DL methods in atomistic prediction in particular. In contrast, advances in image and spectral data have largely leveraged synthetic data enabled by high-quality forward models as well as by generative unsupervised DL methods. In this article, we present a high-level overview of deep learning methods followed by a detailed discussion of recent developments of deep learning in atomistic simulation, materials imaging, spectral analysis, and natural language processing. For each modality we discuss applications involving both theoretical and experimental data, typical modeling approaches with their strengths and limitations, and relevant publicly available software and datasets. We conclude the review with a discussion of recent cross-cutting work related to uncertainty quantification in this field and a brief perspective on limitations, challenges, and potential growth areas for DL methods in materials science.


Introduction
"Processing-structure-property-performance" is the key mantra in Materials Science and Engineering (MSE) [1]. The length and time scales of material structures and phenomena vary significantly among these four elements, adding further complexity [2]. For instance, structural information can range from detailed knowledge of atomic coordinates of elements to the microscale spatial distribution of phases (microstructure), to fragment connectivity (mesoscale), to images and spectra. Establishing linkages between the above components is a challenging task.
Deep learning (DL) [21,22] is a specialized branch of machine learning (ML). Originally inspired by biological models of computation and cognition in the human brain [23,24], one of DL's major strengths is its potential to extract higher-level features from the raw input data.
DL applications are rapidly replacing conventional systems in many aspects of our daily lives as, for example, in image and speech recognition, web search, fraud detection, email/spam filtering, financial risk modeling, and so on. DL techniques have been proven to provide exciting new capabilities in numerous fields (such as playing Go [25], self-driving cars [26], navigation, chip design, particle physics, protein science, drug discovery, astrophysics, object recognition [27], etc).
Recently DL methods have been outperforming other machine learning techniques in numerous scientific fields, such as chemistry, physics, biology, and materials science [20,[28][29][30][31][32]. DL applications in MSE are still relatively new, and the field has not fully explored its potential, implications, and limitations. DL provides new approaches for investigating material phenomena and has pushed materials scientists to expand their traditional toolset.
DL methods have been shown to act as a complementary approach to physics based methods for materials design. While large datasets are often viewed as a prerequisite for successful DL applications, techniques such as transfer learning, multi-fidelity modelling, and active learning can often make DL feasible for small datasets as well [33][34][35][36].
Traditionally, materials have been designed experimentally using trial and error methods with a strong dose of chemical intuition. In addition to being a very costly and time consuming approach, the number of material combinations is so huge that it is intractable to study experimentally, leading to the need for empirical formulation and computational approaches. While computational approaches (such as density functional theory, molecular dynamics, Monte Carlo, phase-field, finite elements) are much faster and cheaper than experiments, they are still limited by length and time scale constraints, which in turn limits their respective domains of applicability. DL methods can offer substantial speedups compared to conventional scientific computing, and, for some applications, are reaching an accuracy level comparable to physics-based or computational models.
Moreover, entering a new domain of materials science and performing cutting-edge research requires years of education, training, and development of specialized skills and intuition. Fortunately, we now live in an era of increasingly open data and computational resources. Mature, well-documented DL libraries makes DL research much more easily accessible to newcomers than almost any other research field. Testing and benchmarking methodologies such as underfitting/overfitting/cross-validation [15,16,37] are common knowledge, and standards for measuring model performance are well established in the community.
Despite their many advantages, DL methods have disadvantages too, the most significant one being their black-box nature [38] which may hinder physical insights into the phenomena under examination. Evaluating and increasing interpretability and explainability of DL models still remains an active field of research. Generally a DL model has a few thousands to millions of parameters, making model interpretation and direct generation of scientific insight difficult.
Although there are several good recent reviews of ML applications in MSE [15-17, 19, 39-49], DL for materials has been advancing rapidly, warranting a dedicated review to cover the explosion of research in this field. In this article, we discuss some of the basic principles in DL methods and then highlight major trends among the recent advances in DL applications for materials science. As the tools and datasets for DL applications in materials keep evolving, we provide a github repository (https://github.com/deepmaterials/dlmatreview) that can be updated as new resources are made publicly available.
2 Basics of deep learning 2

.1 General machine learning concepts
Artificial intelligence (AI) [13] is the development of machines and algorithms that mimics human intelligence, for example, by optimizing actions to achieve certain goals. Machine learning (ML) is a subset of AI, and provides the ability to learn without explicitly being programmed for a given dataset such as playing chess, social network recommendation etc. DL, in turn, is the subset of ML that takes inspiration from biological brains and uses multi-layer neural networks to solve ML tasks. A schematic of AI-ML-DL context and some of the key application areas of DL in materials science and engineering field are shown in Fig. 1.
Some of the commonly used ML technologies are linear regression, decision trees and random forest in which generalized models are trained to learn coefficients/weights/parameters for a given dataset (usually structured i.e., on a grid or a spreadsheet).
For unstructured data (such as pixels or features from an image, sounds, text and graphs) applying traditional ML techniques becomes challenging because users have to first extract generalized meaningful representations or features themselves (such as calculating pair-distribution for an atomic structure) and then train the ML models. Hence, the process becomes time consuming, brittle and not easily-scalable. Here, deep learning (DL) techniques become more important.
DL methods are based on artificial neural networks and allied techniques. According to the "universal approximation theorem" [50,51], neural networks can approximate any function to arbitrary accuracy.

Perceptron
A perceptron or a single artificial neuron [52] is the building block of artificial neural networks (ANNs) and performs forward propagation of information. For a set of inputs [x 1 , x 2 , ..., x m ] to the perceptron, we assign floating number weights (and biases to shift wights) [w 1 , w 2 , ..., w m ] and then we multiply them correspondingly together to get a sum of all of them. Some of the common software packages [53] allowing NN trainings are: PyTorch [54], Tensorflow [55] and MXNet [56].

Activation function
Activation functions (such as sigmoid, hyperbolic tangent (tanh), rectified linear unit (ReLU), leaky ReLU, Swish) are the critical nonlinear components that enable neural networks to compose many small building blocks to learn complex nonlinear functions. For example, the sigmoid activation maps real numbers to the range (0, 1); this activation function is often used in the last layer of binary classifiers to model probabilities. The choice of activation function can affect training efficiency as well as final accuracy [57].

Loss function, gradient descent and normalization
The weight matrices of a neural network are initialized randomly or obtained from a pre-trained model. These weight matrices are multiplied with the input matrix (or output from a previous layer) and subjected to a nonlinear activation function to yield updated representations, which are often referred to as activations or feature maps. The loss function (also known as objective function or empirical risk) is calculated by comparing the output of the neural network and the known target value data. Typically, network weights are iteratively updated via stochastic gradient descent algorithms to minimize the loss function until desired accuracy is achieved. Most modern deep learning frameworks facilitate this by using reverse-mode automatic differentiation [58] to obtain the partial derivatives of loss function with respect to each network parameter through recursive application of the chain rule. Colloquially, this is also known as back-propagation. Some of the common gradient descent algorithms are: Stochastic Gradient Descent (SGD), Adam, Adagrad etc. The learning rate is an important parameter in gradient descent. Except SGD, all other methods use adaptive learning parameter tuning. Depending on the objective such as classification or regression, different loss functions such as Binary Cross Entropy (BCE), Negative Log likelihood (NLLL) or Mean Squared Error (MSE) are used.
The inputs of a neural network are generally scaled i.e., normalized to have zero mean and unit standard deviation. Scaling is also applied to the input of hidden layers (using batch or layer normalization) to improve the stability of ANNs.

Epoch and mini batches
A single pass of the entire training data is called an epoch, and multiple epochs are performed until the weights converge. In DL, datasets are usually large and computing gradients for the entire dataset and network becomes challenging. Hence, the forward passes are done with small subsets of the training data called mini-batches.

Underfitting, overfitting, regularization and early stopping
During an ML training, the dataset is split into training, validation and test sets. The test set is never used during the training process. A model is said to be underfitting if the model performs poorly on training set and lacks capacity to fully learn the training data. A model is said to overfit if the model performs too well on the training data but does not perform well on the validation data.
Overfitting is controlled with regularization techniques such as dropout and early stopping. Regularization discourages the model from simply memorizing the training data so that the model can be generalizable. One of the most popular regularizations is dropout in which we randomly set the activations for an NN layer to zero.
In early stopping, further epochs for training are stopped before the model overfits i.e., accuracy on the validation set flattens or decreases.

Convolution neural networks
Convolutional neural networks (CNN) [59] can be viewed as a regularized version of multilayer perceptrons with a strong inductive bias for learning translationinvariant image representations. There are four main components in CNNs: a) learnable convolution filterbanks, b) nonlinear activations, c) spatial coarsening (via pooling or strided convolution), d) a prediction module, often consisting of fully-connected layers that operate on a global instance representation.
In CNNs we use convolution functions with multiple kernels or filters with trainable and shared weights or parameters, instead of general matrix multiplication. These filters/kernels are matrices with a relatively small number of rows and columns that convolve over the input to automatically extract high-level local features in the form of feature maps. The filters slide/convolve (element wise multiply) across the input with a fixed number of strides to produce the feature map and the information thus learnt is passed to the hidden/fullyconnected layers. These filters can be one, or two or three dimensional depending on the input data.
Similar to the fully connected NNs, nonlinearities such as ReLU are then applied that allows us to deal with non-linear and complex data. The pooling operation preserves spatial invariance, downsamples and reduces dimension of each feature map obtained after convolution. These downsampling/pooling operations can be of different types such as maximum-pooling, minimumpooling, average pooling and sum pooling. After one or more convolutional and pooling layers, the outputs are usually reduced to a one-dimensional global representation. CNNs are especially popular for image data.

Graphs and their variants
Classical CNNs as described above are based on a regular grid Euclidean data (such as 2D grid in images). However, real life data-structures such as social networks, segments of images, word-vectors, recommender systems and atomic/molecular structures are usually non-Euclidean. In such cases, graph based non-Euclidean data-structures become specially important.
Mathematically, a graph G is defined as a set of nodes/vertices V, a set of edges/links, E and node features, X: G = (V, E, X) [60][61][62] and can be used to represent non-Euclidean data. An edge is formed between a pair of two nodes and contains the relation information between the nodes. Each node and edge can have attributes/features associated with it. An adjacency matrix A is a square matrix indicating if there are connections between the nodes or not in the form of 1 and 0. A graph can be of various types such as: undirected/directed, weighted/unweighted, homogeneous/heterogeneous, static/dynamic. An undirected graph captures symmetric relations between nodes, while an directed one captures asymmetric relations such that Aij = Aji. In a weighted graph, each edge is associated with a scalar weight rather than just 1s and 0s. In a homogeneous graph, all the nodes represent instances of the same type and all the edges capture relations of the same type while in a heterogeneous graph, the nodes and edges can be of different types. Heterogeneous graphs provide an easy interface for managing nodes and edges of different types as well as their associated features. When input features or graph topology vary with time, they are called dynamic graphs otherwise they are considered static. If a node is connected to another node more than once it is termed as a multi-graph. networks (GNNs) are DL methods that operate on graph domain and can capture the dependence of graphs via message passing between the nodes and edges of graphs. There are two key steps in GNN training: a) we first aggregate information from neighbors and b) update the nodes and/or edges. Importantly, aggregation is permutation invariant. Similar to the fully-connected NNs, the input node-features, X (with embedding matrix) are multiplied with the adjacency matrix and the weight matrices and then multiplied with the nonlinear activation function to to provide outputs for the next layer. This method is called propagation rule.

Sequence-to-sequence models
Traditionally, learning from sequential inputs such as text involves first generating a fixed-length input from the data. For example, the "bag-of-words" approach simply counts the number of instances of each word in a document and produces a fixed-length vector that is the size of the overall vocabulary.
In contrast, sequence-to-sequence models can take into account sequential / contextual information about each word and produce outputs of arbitrary length. For example, in named entity recognition (NER), an input sequence of words (e.g., a chemical abstract) is mapped to an output sequence of "entities" or categories where every word in the sequence is assigned a category.
An early form of sequence-to-sequence model is the recurrent neural network, or RNN. Unlike the fully connected NN architecture, where there is no connection between hidden nodes in the same layer, but only between nodes in adjacent layers, RNN have feedback connections and each hidden layer can be unfolded and processed similarly to traditional NNs sharing same weight matrices. There are multiple types of RNNs, of which the most common ones are: gated recurrent unit recurrent neural network (GRURNN), long short-term memory (LSTM) network, and clockwork RNN (CW-RNN) [69].
However, all such RNNs suffer from some drawbacks, including: (i) difficulty of parallelization and therefore difficulty in training on large data sets and (ii) difficulty in preserving long-range contextual information due to the "vanishing gradient" problem. Nevertheless, as we will later describe, LSTMs have been successfully applied to various NER problems in the materials domain.
More recently, sequence-to-sequence models based on a "transformer" architecture, such as Google's Bidirectional Encoder Representations from Transformers (BERT) model [53,70], have helped address some of the issues of traditional RNNs. Rather than passing a state vector that is iterated word-byword, such models use an attention mechanism to allow access to all previous words simultaneously without explicit time steps. This facilitates parallelization and also better preserves long-term context.

Deep generative models (VAE and GAN)
While the above DL frameworks are based on supervised machine learning (i.e., we know the target or ground truth data such as in classification and regression) and discriminative (i.e., learn differentiating features between various datasets), many AI tasks are based on unsupervised (such as clustering) and are generative (i.e., aim to learn underlying distributions).
Generative models are used to a) generate data samples similar to the training set with variations i.e., augmentation, b) learn good generalized latent features, c) guide mixed reality applications such as virtual try-on. There are various types of generative models, of which the most common are: a) variational encoders (VAE), which explicitly define and learn likelihood of data, b) Generative adversarial networks (GAN), which learn to directly generate samples from model's distribution, without defining any density function.

Deep reinforcement learning
Reinforcement learning (RL) deals with tasks in which a computational agent learns to make decisions by trial and error. Deep RL uses DL into the RL framework, allowing agents to make decisions from unstructured input data. In traditional RL, Markov decision process (MDP) is used in which an agent at every timestep takes action to receive a scalar reward and transitions to the next state according to system dynamics to learn policy in order to maximize returns. However, in deep RL, the states are high-dimensional (such as continuous images or spectra) which act as an input to DL methods. DRL architectures can be either model based or model free.

Applications of DL methods
Some aspects of successful DL application that require materials-science-specific considerations are: 1) acquiring large, balanced and diverse datasets (often on the order of 10000 data points or more), 2) determing an appropriate DL approach and suitable vector or graph representation of the input samples, and 3) selecting appropriate performance metrics relevant to scientific goals.
In the following sections we discuss some of the key areas of materials science in which DL has been applied with available links to repositories and datasets that help in reproducibility and extensibility of the work. In this review we categorize materials science applications at a high level by the type of input data considered: 3.1 atomistic, 3.2 stoichiometric, 3.3 spectral, 3.4 image, and 3.5 text. Within each broad materials data modality, we summarize prevailing machine learning tasks and their impact on materials research and development.

Atomistic and chemical representations
In this section we provide a few examples of solving materials science problems with DL methods trained on atomistic data. Atomic structure of a material usually consists of atomic coordinates and atomic composition information of a material. Arbitrary number of atoms and types of elements in a system poses a challenge to apply traditional ML algorithms for atomistic predictions. DL based methods are an obvious strategy to tackle this problem. There have been several previous attempts to represent crystals and molecules using fixed size descriptors such as Coulomb matrix [71][72][73], classical force field inspired descriptors (CFID) [74][75][76], pair distribution function (PRDF), Voronoi tessellation [77][78][79]. Recently graph neural network methods have been shown to surpass previous hand-crafted feature set [28].
DL for atomistic materials applications include: a) force-field development, b) direct property predictions, c) materials screening. In addition to the above points, we also elucidate upon some of the recent generative adversarial network and complimentary methods to atomistic aproaches.

Databases and software libraries
In Table 1 we provide some of the commonly used datasets used for atomistic DL models for molecules, solids and proteins. We note that the computational methods method used for different datasets are different and many of them are continuously evolving. Generally it takes years to generate such databases using conventional methods such as density functional theory, while DL methods can be used to make predictions with much reduced computational cost and reasonable accuracy. Table 1 we provide DL software packages used for atomistic materials design. The type of models includes general property (GP) predictors and interatomic force fields (FF). The models have been demonstrated in molecules (Mol), solid state materials (Sol) or proteins (Prot). For some force fields, high performance large scale implementations (LSI) that leverage paralleling computing exist. Some of these methods mainly used interatomic distances to build graphs while others use distances as well as bond angle information. Recently including bond angle within GNN has shown to drastically improve the performance with comparable computational timings.

Applications
Force field development The first application includes development of DL based force-fields (FF) [80,110]/interatomic potentials. Some of the major advantages of such applications are that they are very fast (on the order of hundreds to thousands times [62]) for making predictions and solve the tenuous development of FFs, but the disadvantage is they still require a large dataset using computationally expensive methods to train. Models such as Behler-Parrinello neural network (BPNN) and its variants [111,112] are used for developing interatomic potentials that can be used for beyond just 0 K temperature and time dependent behavior using molecular dynamics simulations such as for nanoparticles [113]. Such FF models have been developed for molecular systems such as water, methane and other organic molecules [102,112] as well as solids such as silicon [111], sodium [114], graphite [115] and titania (T iO 2 ) [116].
While the above works are mainly based on NNs, there have also been development of graph neural network force field (GNNFF) framework [117,118] that bypasses both computational bottlenecks. GNNFF can predict atomic forces directly using automatically extracted structural features that are not only translationally-invariant, but rotationally-covariant to the coordinate space of the atomic positions. In addition to development of pure NN based FFs, there have also been recent developments of combining traditional FFs such as bondorder potentials with NNs and ReaxFF with message passing neural network (MPNN) that can help mitigate the NNs issue for extrapolation [119,120].

Direct property prediction from atomistic configurations
DL methods can be used to to establish structure-property relationship between atomic structure and their properties with high accuracy [28,97]. Models such as SchNet, crystal graph convolutional neural network (CGCNN), improved crystal graph convolutional neural network (iCGCNN), directional message passing neural network (DimeNet), atomistic line graph neural network (ALIGNN) and materials graph neural network (MEGNet) shown in Table 1 have been used to predict up to 50 properties of crystalline and molecular materials. These property datasets are usually obtained from ab-initio calculations. A schematic of such models shown in Fig. 2. While SchNet, CGCNN, MEGNet are primarily based on atomic distances only, iCGCNN, DimeNet, and ALIGNN models capture many body interactions using GCNN.
For instance, the current state of the art mean absolute error for formation energy for solids and internal energy for molecules at 0 K are 0.022 eV/atom and 0.002 eV as obtained by the ALIGNN model [93]. DL is also heavily being used for predicting catalytic behavior of materials such as the Open Catalyst Project [121] which is driven by the DL methods materials design. There is an ongoing effort to continuously improve the models. Usually energy based models such as formation and total energies are more accurate than electronic property based models such as bandgaps and power factors.
In addition to molecules and solids, property predictions models have also been used for bio-materials such as proteins, which can be viewed as a large molecule. There have been several efforts for predicting protein based properties such as binding affinity [108] and docking predictions [109].
There have been also several applications for identifying reasonable chemical space using DL methods such as autoencoders [122], reinforcement learning [123][124][125] for inverse materials design. Inverse materials design with techniques such as GAN deals with finding chemical compounds with suitable properties and act as complementary to forward prediction models. While such concepts have been widely applied to molecular systems, [126], recently these methods have been applied to solids as well [127][128][129][130][131].

Fast materials screening
DFT based high-throughput methods are usually limited to few thousands of compounds and takes a long time for calculations, DL based methods can aid this process and allow much faster predictions. DL based property prediction models mentioned above can be used for pre-screening chemical compounds. Hence, DL based tools can be viewed as a pre-screening tool for traditional methods such as DFT. For example, Xie et al. used CGCNN model to screen stable perovskite materials [95] as well hierarchical visualization of materials space [132]. Park et al. [133] used iCGCNN to screen T hCr 2 Si 2 -type materials. Lugier et al used DL methods to predict thermoelectric properties [134]. Rosen et al. [90] used graph neural network models to predict the bandgaps of metal organic frameworks. DL for molecular materials have been used to predict technologically important properties such as aqueous solubility [135] and toxicity [136].
It should be noted that the full atomistic representations and the associated DL models are only possible if the crystal structure and atom positions are available. In practice, the precise atom positions are only available from DFT structural relaxations or experiments, and are one of the goals for materials discovery instead of the starting point. Hence, alternative methods have been proposed to bypass the necessity for atom positions in building DL models. For example, Jain and Bligaard [137] proposed the atomic position independent descriptors and used a CNN model to learn energies of crystals. Such descriptors include information only on the symmetry information (e.g., spacegroup and Wyckoff position). In principle, the method can be applied universally in all crystals. Nevertheless, the model errors tend to be much higher than graph-based models. Similar coarse-grained representation using Wyckoff representation was also used by Goodall et al. [138]. Alternatively, Zuo et al. [139] started from the hypothetical structures without precise atom positions, and used a Bayesian optimization method coupled with a MEGNet energy model as energy evaluator to perform direct structural relaxation. The application of the developed Bayesian optimization with symmetry relaxation (BOWSR) algorithm successfully discovered ReWB (Pca2 1 ) and MoWC 2 (P6 3 /mmc) hard materials, which were then experimentally synthesized.

Chemical formula and segment representations
One of the earliest applications for DL included SMILES for molecules, elemental fractions and chemical descriptors for solids and sequence of protein names as descriptors. Such descriptors lack explicit inclusion of atomic structure information but are still useful for various pre-screening applications for both theoretical and experimental data.

SMILES and fragment representation
The simplified molecular-input line-entry system (SMILES) is a method to represent elemental and bonding for molecular structures using short American Standard Code for Information Interchange (ASCII) strings. SMILES can express structural differences including the chirality of compounds making it more useful than simply chemical formula. A SMILES string is a simple gridlike (1-D grid) structure that can represent molecular sequences such as DNA, macromolecules/polymers, protein sequences also [140,141]. In addition to the chemical constituents as in chemical formula, bondings (such as double and triple bondings) are represented by special symbols (such as '=' and '#'). The presence of a branch point indicated using a left-hand bracket "(" while the right-hand bracket ")" indicates that all the atoms in that branch have been taken into account. SMILES strings are represented as a distributed representation termed a SMILES feature matrix (as a sparse matrix), and then we can apply DL to the matrix similar to image data. The length of the SMILES matrix is generally kept fixed (such as 400) during training and in addition to the SMILES multiple elemental attributes and bonding attributes (such as chirality, aromaticity) can be used. Key DL tasks for molecules include a) novel molecule design, b) molecule screening. Novel molecules with target properties can designed using VAE, GAN and RNN based methods [142][143][144]. These DL generated molecules might not be physically valid, but the goal is to train the model to learn the patterns in SMILES strings such that the output resembles valid molecules. Then chemical intuitions can be further used to screen the molecules. DL for SMILES can also be used for molecularscreening such as to predict molecular toxicity. Some of the common SMILES datasets are: ZINC [145], Tox21 [146] and PubChem [147].
Due to the limitations to enforce the generation of valid molecular structures from SMILES, fragment based models are developed such as DeepFrag and DeepFrag-K [148,149]. In fragment based models, a ligand/receptor complex is removed and then a DL model is trained to predict the most suitable fragment substituent. A set of useful tools for SMILES and fragment representations are provided in Table 2.

Chemical formula representation
There are several ways of using the chemical formula based representations for building ML/DL models, beginning with a simple vector of raw elemental fractions [150,151] or of weight percentages of alloying compositions [152][153][154][155], as well as more sophisticated hand-crafted descriptors or physical attributes to add known chemistry knowledge (e.g. electronegativity, valency, etc. of constituent elements) to the feature representations [156][157][158][159][160][161]. Statistical and mathematical operations such as average, max, min , median, mode, and exponentiation can be carried out on elemental properties of the constituent elements to get a set of descriptors for a given compound. The number of such composition-based features can range from a few dozens to few hundreds. One of the commonly used representations that has been shown to work for a variety of different use-cases is the materials agnostic platform for informatics and exploration (MagPie) [160]. All these composition-based representations can be used with both traditional ML methods such as Random Forest as well as DL.
It is relevant to note that ElemNet [151], which is a 17-layer neural network composed of fully-connected layers and uses only raw elemental fractions as input, was found to significantly outperform traditional ML methods such as Random Forest, even when they were allowed to use more sophisticated physical attributes based on MagPie as input. Although no periodic table information was provided to the model, it was found to self-learn some interesting chemistry, like groups (element similarity) and charge balance (element interaction), and was also able to predict phase diagrams on unseen materials systems, underscoring the power of DL for representation learning directly from raw inputs without explicit feature extraction. Further increasing the depth of the network was found to adversely affect the model accuracy due to the vanishing gradient problem. To address this issue, Jha et al. [162] developed IRNet, which uses individual residual learning to allow a smoother flow of gradients and enable deeper learning for cases where big data is available. IRNet models were tested on a variety of big and small materials datasets, such as OQMD, AFLOW, Materials Project, JARVIS, using different vector-based materials representations (element fractions, MagPie, structural) and were found to not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data [163]. Further, graph based methods such as Roost [164] have also been developed which can outperform many similar techniques.
Such methods have been used for diverse DFT datasets mentioned above in Table 1 as well as experimental datasets such as SuperCon [165,166] for quick pre-screening applications. In terms of applications, they have have been applied for predicting properties such as formation energy [151], band gap and magnetization [162], superconducting temperatures [166], bulk and shear modulus [163]. They have also been used for transfer learning across datasets for enhanced predictive accuracy on small data [34], even for different source and target properties [167].
There have been libraries of such descriptors developed such as MatMiner [161] and DScribe [168]. Some examples of such models are given in Table 2. Such representations are especially useful for experimental dataset such as superconducting material dataset where actual atomic structure is not known. However, these representations cannot distinguish different polymorphs of a system with different point groups and space groups. It has been recently shown that although composition-based representations can help build ML/DL models to predict some properties like formation energy with a remarkable accuracy, it does not necessarily translate to accurate predictions of other properties such as stability, when compared to DFT's own accuracy [169].

Spectral models
When electromagnetic radiation hits materials, the interaction between the radiation and matter measured as a function of the wavelength or frequency of the radiation produces a spectroscopic signal. By studying spectroscopy, researchers can gain insights into the materials' composition, structural, and dynamic properties. Spectroscopic techniques are foundational in materials characterization. For instance, X-ray diffraction (XRD) has been used to characterize the crystal structure of materials for more than a century. Spectroscopic analysis can involve fitting quantitative physical models (for example, Rietveld refinement) or more empirical approaches such as fitting linear combinations of reference spectra, such as with x-ray absorption near edge spectroscopy (XANES). Both approaches require a high degree of researcher expertise through careful design of experiments; specification, revision, and iterative fitting of physical models; or the availability of template spectra of known materials. In recent years, with the advances in high-throughput experiments and computational data, spectroscopic data has multiplied, giving opportunities for researchers to learn from the data and potentially displace the conventional methods in analyzing such data. This section covers emerging DL applications in various modes of spectroscopic data analysis, aiming to offer practice examples and insights. Some of the applications are shown in Fig.3.

Databases and software libraries
Currently, large-scale and element-diverse spectral data mainly exist in computational databases. For example, in Ref. [181], the authors calculated the infrared spectra, piezoelectric tensor, Born effective charge tensor, and dielectric response as a part of the JARVIS-DFT DFPT database. The Materials Project has established the largest computational X-ray absorption database (XASDb), covering the K-edge X-ray near-edge fine structure (XANES) [178,196] and the L-edge XANES [179] of a large number of material structures. The database currently hosts more than 400000 K-edge XANES site-wise spectra and 90000 L-edge XANES site-wise spectra of many compounds in the Materials Project. There is considerably fewer experimental XAS spectra, being on the order of hundreds, as seen in the EELSDb and the XASLib. Collecting large experimental spectra databases that cover a wide range of elements is a challenging task. Collective efforts have been focusing on curating data extracted from different sources, as found in the RRUFF Raman, XRD and chemistry database [182], the open Raman database [187], and the SOP spectra library [192]. However, data consistency is not guaranteed. It is also now possible for contributors to share experimental data in a Materials Project curated database, MPContribs [186]. This database is supported by the US Department of Energy (DOE) providing some expectation of persistence. Entries can be kept private or published and are linked to the main materials project computational databases. There is an ongoing effort to capture data from DOE funded synchrotron light sources (https://lightsources.materialsproject.org/) into MPContribs in the future. Recent advances in sources, detectors and experimental instrumentation have made high-throughput measurements of experimental spectra possible, giving rise to new possibilities for spectral data generation and modeling. Such examples include the HTEM database [12] that contains 50000 optical absorption spectra, the UV-Vis database of 180000 samples from the Joint Center for Artificial Photosynthesis. Some of the common spectra databases for Example applications of deep-learning for spectral data. a) Predicting structure information from the X-ray diffraction [197], Reprinted according to the terms of the CC-BY license. [197] Copyright 2020. b) Predicting catalysis properties from computational electronic density of states data. Reprinted according to the terms of the CC-BY license. [198]. Copyright 2021. spectra data are shown in Table 3. There are beginning to appear cloud-based software as a service platforms for high throughput data analysis, for example, pair-distribution function (PDF) in the cloud (https://pdfitc.org) [189] which are backed by structured databases, where data can be kept private or made public. This transition to the cloud from data analysis software installed and run locally on a user's computer will facilitate the sharing and reuse of data by the community.

Applications
Due to the widespread deployment of XRD across many materials technologies, XRD spectra became one of the first test grounds for DL models. Phase identification from XRD can be mapped into a classification task (assuming all phases are known) or an unsupervised clustering task. Multi-phase diffraction data Unlike the traditional analysis of XRD data, where the spectra are treated as convolved, discrete peak positions and intensities, DL methods treat the data as an continuous pattern similar to an image. Unfortunately, a significant number of experimental XRD data-sets in one place are not readily available at the moment. Nevertheless, extensive, high-quality crystal structure data makes creating simulated XRD trivial.
Park et al. [199] calculated 150000 XRD patterns from the Inorganic Crystal Structure Database (ICSD) structural database [200] and then used CNN models to predict structural information from the simulated XRD patterns. The accuracies of the CNN models reached 81.14 %, 83.83 %, and 94.99 % for space-group, extinction-group, and crystal-system classifications, respectively.
Liu et al. [79] obtained similar accuracies by using a CNN for classifying atomic pair distribution function (PDF) data into space groups. The PDF is obtained by Fourier transforming XRD into real-space and is particularly useful for studying the local and nano-scale structure of materials. In the case of the PDF, models were trained, validated and tested on simulated data from the ICSD. However, the trained model showed excellent performance when it was given experimental data, something that can be a challenge in XRD data because of the different resolutions and line-shapes of the diffraction data depending on specifics of the sample and experimental conditions. The PDF seems to be more robust against these aspects.
Similarly, Zaloga et al. [201] also used the ICSD database for XRD pattern generation and CNN models to classify crystals. The models achieved 90.02 % and 79.82 % accuracy for crystal systems and space groups, respectively.
It should be noted that the ICSD database contains many duplicates, and such duplicates should be filtered out to avoid information leakage. There is also a large difference in the number of structures represented in each space group (the label) in the database resulting in data normalization challenges.
Lee et al. [202] developed a CNN model for phase identification from samples consisting of a mixture of several phases in a limited chemical space relevant for battery materials. The training data are mixed patterns consisting of 1785405 synthetic XRD patterns from the Sr-Li-Al-O phase space. The resulting CNN can not only identify the phases but also predict the compound fraction in the mixture. A similar CNN was utilized by Wang et al. [203] for fast identification of metal-organic frameworks (MOFs), where experimental spectral noise was extracted and then synthesized into the theoretical XRD for training data augmentation.
An alternative idea was proposed by Dong et al. [204], where instead of recognizing only phases from the CNN, a proposed "parameter quantification network" (PQ-Net) was able to extract physico-chemical information. The PQ-Net yields accurate predictions for scale factors, crystallite size, and lattice parameters for simulated and experimental XRD spectra. The work by Aguiar et al. [205] took a step further and proposed a modular neural network architecture that enables the combination of diffraction patterns and chemistry data and provided a ranked list of predictions. The ranked list predictions provides user flexibility and overcomes some aspects of overconfidence in model predictions. In practical applications, AI-driven XRD identification can be beneficial for highthroughput materials discovery, as shown by Maffettone et al. [206] In their work, an ensemble of fifty CNN models was trained on synthetic data reproducing experimental variations (missing peaks, broadening, peaking shifting, noises). The model ensemble is capable of predicting the probability of each category label. A similar data augmentation idea was adopted by Oviedo et al. [195], where experimental XRD data for 115 thin-film metal-halides were measured, and CNN models trained on the augmented XRD data achieved accuracies of 93 % and 89 % for classifying dimensionality and space group, respectively.
Although not a DL method, an unsupervised machine learning approach, non-negative matrix factorization (NMF), is showing great promise for yielding chemically relevant XRD spectra from time-or spatially-dependent sets of diffraction patterns. NMF is closely related to principle component analysis in that it takes a set of patterns as a matrix and then compresses the data by reducing the dimensionality by finding the most important components. In NMF a constraint is applied that all the components and their weights must be strictly positive. This often corresponds to a real physical situation (for example, spectra tend to be positive, as are the weights of chemical constituents). As a result we are finding that the mathematical decomposition often results in interpretable, physically meaningful, components and weights, as shown by Liu et al. for PDF data [207]. An extension of this showed that in a spatially resolved study, NMF could be used to extract chemically resolved differential PDFs (similar to the information in EXAFS) from non-chemically resolve PDF measurements [208]. NMF is very quick and easy to apply and can be applied to just about any set of spectra. It is likely to become widely used and is being implemented in the PDFitc.org website to make it more accessible to potential users.
Other than XRD, the XAS, Raman, and infrared spectra, also contain rich structure-dependent spectroscopic information about the material. Unlike XRD, where relatively simple theories and equations exist to relate structures to the spectral patterns, the relationships between general spectra and structures are somewhat illusive. This difficulty has created a higher demand for machine learning models to learn structural information from other spectra.
For instance, the case of X-ray absorption spectroscopy (XAS), including the X-ray absorption near-edge spectroscopy (XANES) and extended X-ray absorption fine structure (EXAFS), is usually used to analyze the structural information on an atomic level. However, the high signal-to-noise XANES region has no equation for data fitting. DL modeling of XAS data is fascinating and offers unprecedented insights. Timoshenko et al. used neural networks to predict the coordination numbers of Pt [209] and Cu [210] in nanoclusters from the XANES. Aside from the high accuracies, the neural network also offers high prediction speed and new opportunities for quantitative XANES analysis. Timoshenko et al. [211] further carried out a novel analysis of EXAFS using DL. Although EXAFS analysis has an explicit equation to fit, the study is limited to the first few coordination shells and on relatively ordered materials. Timoshenko et al. [211] first transformed the EXAFS data into 2D maps with a wavelet transform and then supplied the 2D data to a neural network model. The model can instantly predict relatively long-range radial distribution functions, offering in situ local structure analysis of materials. The advent of high-throughput XAS databases has recently unveiled more possibilities for machine learning models to be deployed using XAS data. For example, Zheng et al. [196] used an ensemble learning method to match and fast search new spectra in the XASDb. Later, the same authors showed that random forest models outperform DL models such as MLPs or CNNs in predicting atomic environment labels from the XANES spectra directly [212]. Similar approaches were also adopted by Torrisi et al. [213] In practical applications, Andrejevic et al. [214] used the XASDb data together with the topological materials database and constructed CNN models to classify the topology of materials from the XANES and symmetry group inputs. The model correctly predicted 81 % topological and 80 % trivial cases and achieved 90 % accuracy in material classes that contain certain elements.
Raman, infrared, and other vibrational spectroscopies provide structural fingerprints and are usually used to discriminate and estimate the concentration of components in a mixture. For example, Madden et al. [215] have used neural network models to predict the concentration of illicit materials in a mixture using the Raman spectra. Interestingly, several groups have independently found that DL models outperform chemometrics analysis in vibrational spectroscopies [216,217]. For learning vibrational spectra, the number of training spectra is usually less than or on the order of the number of features (intensity points), and the models can easily overfit. Hence, dimensional reduction strategies are commonly used to compress the information dimension using, for example, principal component analysis (PCA) [218,219]. DL approaches do not have such concerns and offer elegant and unified solutions. For example, Liu et al. [220] applied CNN models to the Raman spectra in the RRUFF spectral database and show that CNN models outperform classical machine learning models such as SVM in classification tasks. More DL applications in vibrational spectral analysis can be found in a recent review by Yang et al. [221] Although most current DL work focuses on the inverse problem, i.e., predicting structural information from the spectra, some innovative approaches also solve the forward problems by predicting the spectra from the structure. In this case, the spectroscopy data can be viewed simply as a high-dimensional material property of the structure. This is most common in molecular science, where predicting the infrared spectra [222], molecular excitation spectra [223], is of particular interest. In the early 2000s, Selzer et al. [222] and Kostka et al. [224] attempted predicting the infrared spectra directly from the molecular structural descriptors using neural networks. Non-DL models can also be used to perform such tasks to a reasonable accuracy [225]. For DL models, Chen et al. [226] used a Euclidean neural network (E(3)NN) to predict the phonon density of state (DOS) spectra from atom positions and element types. The E(3)NN model captures symmetries of the crystal structures, with no need to perform data augmentation to achieve target invariances. Hence the E(3)NN model is extremely data-efficient and can give reliable DOS spectra prediction and heat capacity using relatively sparse data of 1200 calculation results on 65 elements. A similar idea was also used to predict the XAS spectra. Carbone et al. [227] used a message passing neural network (MPNN) to predict the O and N K-edge XANES spectra from the molecular structures in the QM9 database [9]. The training XANES data were generated using the FEFF package [228]. The trained MPNN model reproduced all prominent peaks in the predicted XANES, and 90 % of the predicted peaks are within 1 eV of the FEFF calculations. Similarly, Rankine et al. [229] started from the two-body radial distribution function (RDC) and used a deep neural network model to predict the Fe K-edge XANES spectra for arbitrary local environments.
In addition to learn the structure-spectra or spectra-structure relationships, a few works have also explored the possibility of relating spectra to other material properties in a non-trivial way. The DOSnet proposed by Fung et al. [198] (Figure 3b) uses the electronic DOS spectra calculated from DFT as inputs to a CNN model to predict the adsorption energies of H, C, N, O, S and their hydrogenated counterparts, CH, CH 2 , CH 3 , NH, OH, and SH, on bimetallic alloy surfaces. This approach extends the previous d-band theory [230], where only the d-band center, a scalar, was used to correlate with the adsorption energy on transition metals. Stein et al. [231] tried to learn the mapping between the image and the UV-vis spectrum of the material using the conditional variational encoder (cVAE) with neural network models as the backbone. Such models can generate the UV-vis spectrum directly from a simple material image, offering much faster material characterizations.

Image based models
Computer vision is often credited as the precipitating the current wave of mainstream DL applications a decade ago [232]. Naturally, materials researchers have developed a broad portfolio of applications of computer vision for accelerating and improving image-based material characterization techniques. High-level microscopy vision tasks can be organized as follows: • image classification (and material property regression) • auto-tuning experimental imaging hyperparameters • pixel-wise learning (e.g. semantic segmentation) • superresolution imaging • object/entity recognition, localization, and tracking • microstructure representation learning Often these tasks generalize across many different imaging modalities, spanning optical microscopy (OM), scanning electron microscopy (SEM) techniques, scanned probe microscopy (SPM, as in scanning tunneling microscopy (STM) or atomic force microscopy (AFM), and transmission electron microscopy (TEM) variants, including scanning transmission electron microscopy (STEM).
The images obtained with these techniques range from capturing local atomic to mesoscale structures (microstructure), the distribution and type of defects and their dynamics which are critically linked to the functionality and performance of the materials. Atomic-scale imaging has become widespread and near-routine over the past few decades due to aberration corrected STEM [233]. Increasingly, collection of large image datasets is presenting an analysis bottleneck in the materials characterization pipeline, and the immediate need for automated image analysis becomes important. Non-DL image analysis methods have driven tremendous progress in quantitative microscopy, but often image processing pipelines are brittle and require too much manual identification of image features to be broadly applicable. Thus, DL is currently the most promising solution for high performance, high throughput automated analysis of image datasets. For a good overview of applications in microstructure characterization specifically, see [234].

Databases and software libraries
Image datasets for materials can come from either experiments or simulations. Software libraries mentioned above can be used to generate images such as STM/STEM. Images can also be obtained from the literature. A few common examples for image datasets is shown below in Table 4. Recently, there has been a rapid development in the field of image learning tasks for materials leading to several useful packages. We list some of them in Table 4.

Applications
DL for images can be used to automatically extract information from images or transform images into a more useful state. The benefits of automated image analysis include higher throughput, better consistency of measurements compared to manual analysis, and even the ability to measure signals in images that humans cannot detect. The benefits of altering images include image super-resolution, denoising, inferring 3D structure from 2D images, and more. Examples of the applications of each task are summarized below.

Image classification and regression
Classification and regression are the processes of predicting one or more values associated with an image. In the context of DL the only difference between the two methods is that the outputs of classification are discrete while the outputs of regression models are continuous. The same network architecture may be used for both classification and regression by choosing the appropriate activation function (i.e., linear for regression or Softmax for classification) for the output of the network. Due to its simplicity image classification is one of the most established DL techniques available in the materials science literature. Nonetheless, this technique remains an area of active research.
Modarres et al. applied DL with transfer learning to automatically classify SEM images of different material systems [265]. They demonstrated how a single approach can be used to identify a wide variety of features and material systems such as particles, fibers, Microelectromechanical systems (MEMS) devices, and more. The model achieved 90 % accuracy on a test set. Misclassifications resulted from images that contained objects from multiple different classes, which is an inherent limitation of single-class classification. More advanced techniques like the ones described in subsequent sections can be applied to avoid these limitations. Additionally, they developed a system to deploy the trained model at scale to process thousands of images in parallel. This approach is essential for large scale, high-throughput experiments or industrial applications of classification. ImageNet-based deep transfer learning has also been successfully applied for crack detection in macroscale materials images [266,267], as well as for property prediction on small, noisy, and heterogeneous industrial datasets [268,269]. DL has also been applied to characterize the symmetries of simulated measurements of samples. In ref [270], Ziletti et al. obtained a large database of perfect crystal structures, introduced defects into the perfect lattices, and simulated diffraction patterns for each structure. DL models were trained to identify the space group of each diffraction patterns. The model achieved high classification performance, even on crystals with significant numbers of defects, surpassing the performance of conventional algorithms for detecting symmetries from diffraction patterns.
DL has also been applied to classify symmetries in simulated STM measurements of 2D material systems [235]. DFT was used to generate simulated STM images for a variety of material systems. A convolutional neural network was trained to identify which of the five 2D Bravais lattices each material belonged to using the simulated STM image as input. The model achieved an average F1 score of around 0.9 for each lattice type.
DL has also been used to improve the analysis of electron backscatter diffraction (EBSD) data, with Liu et al. [271] presenting one of the first DLbased solution for EBSD indexing capable of taking an EBSD image as input and predicting the three Euler angles representing the orientation that would have led to the given EBSD pattern. However, they considered the three Euler angles to be independent of each other, creating separate CNNs for each angle, although the three angles should really be considered together. Jha et al. [257] built upon that work to train a single DL model to predict the three Euler angles in simulated EBSD patterns of polycrystalline Ni while directly minimizing the misorientation angle between the true and predicted orientations. When tested on experimental EBSD patterns, the model achieved 16 % lower disorientation error than dictionary based indexing. Similarly, Kaufman et al. trained a CNN to predict the corresponding space group for a given diffraction pattern [272]. This enables EBSD to be used for phase identification in samples where the existing phases are unknown, providing a faster or more cost effective method of characterizing than X-ray or neutron diffraction. The results from these studies demonstrate the promise of applying DL to improve the performance and utility of EBSD experiments.
Recently, DL has also been to learn crystal plasticity using images of strain profiles as input [259,260]. The work in [259] used domain knowledge integration in the form of two-point auto-correlation to enhance the predictive accuracy, while [260] applied residual learning to learn crystal plasticity at nanoscale. It used strain profiles of materials of varying sample widths ranging from 2 µm down to 62.5 nm obtained from discrete dislocation dynamics to build a deep residual network capable of identifying prior deformation history of the sample as low, medium, or high. Compared to correlation function based method (68.24 % accuracy), the DL model was found to be significantly more accurate (92.48 %), and also capable of predicting stress-strain curves of test samples. This work also used saliency maps to try to interpret the developed DL model.

Pixelwise learning
DL can also be applied to generate one or more predictions for every pixel in an image. This can provide more detailed information about the size, position, orientation, and morphology of features of interest in images. Thus, pixelwise learning has been a significant area of focus with many recent studies appearing in materials science literature.
Azimi et al. applied an ensemble of fully convolutional neural networks to segment martensite, tempered martensite, bainite, and pearlite in SEM images of carbon steels. Their model achieved 94 % accuracy, demonstrating a significant improvement over previous efforts to automate the segmentation of different phases in SEM images. Decost, Francis, and Holm applied PixelNet to segment microstructural constituents in the UltraHigh Carbon Steel Database [239,240]. In contrast to fully convolutional neural networks, which encode and decode visual signals using a series of convolution layers, PixelNet constructs "hypercolumns," or concatenations of feature representations corresponding to each pixel at different layers in a neural network. The hypercolumns are treated as individual feature vectors, which can then be classified using any typical classification approach, like a multi-layer perceptron. This approach achieved phase segmentation precision and recall scores of 86.5 % and 86.5 %, respectively. Additionally, this approach was used to segment spheroidite particles in the matrix, achieving precision and recall scores of 91.1 % and 91.1 %, respectively.
Pixelwise DL has also been applied to automatically segment dislocations in Ni superalloys [234]. Dislocations are visually similar to γ − γ and dislocation in Ni superalloys. With limited training data, a single segmentation model was unable to distinguish between these features. To overcome this, a second model was trained to generate a coarse mask corresponding to the deformed region in the material. Overlaying this mask with predictions from the first model selects the dislocations, enabling them to be distinguished from γ − γ interfaces.
Stan, Thompson, and Voorhees applied Pixelwise DL to characterize dendritic growth from serial sectioning and synchrotron computed tomography data [273]. Both of these techniques generate large amounts of data, making manual analysis impractical. Conventional image processing approaches, utilizing thresholding, edge detectors, or other hand-crafted filters, are not able to deal with noise, contrast gradients, and other artifacts that are present in the data. Despite having a small training set of labeled images, SegNet was able to automatically segment these images with much higher performance.

Object/entity recognition, localization, and tracking
Object detection or localization is needed when individual instances of recognized objects in a given image need to be distinguished from each other. In cases where instances do not overlap each other by a significant amount, individual instances can be resolved through post processing of semantic segmentation outputs. This technique has been applied extensively to the detection of individual atoms and defects in microstructural images.
Madsen et al. applied pixelwise DL to detect atoms in simulated atomicresolution TEM images of graphene [274]. A neural network was trained to detect the presence of each atom as well as predict its column height. Pixel-wise results are used as seeds for watershed segmentation to achieve instance-level detection. Analysis of the arrangement of the atoms led to autonomous characterization of defects in the the lattice structure of the material. Interestingly, despite being trained only on simulations, the model successfully detected atomic positions in experimental images.
Maksov et al. demonstrated atomistic defect recognition and tracking across sequences of atomic-resolution STEM images of WS 2 [275]. The lattice structure and defects existing in the first frame were characterized through a physicsbased approach utilizing Fourier transforms. The positions of atoms and defects in the first frame were used to train a segmentation model. Despite only using the first frame for training, the model successfully identified and tracked defects in the subsequent frames for each sequence, even when the lattice underwent significant deformation. Similarly, Yang et al. [276] used U-net architecture (as shown in Fig. 4) to detect vacancies and dopants in WSe 2 in STEM images with model accuracy up to 98 %. They classified the possible atomic sites based on experimental observations into five different types: tungsten, vanadium substituting for tungsten, selenium with no vacancy, mono-vacancy of selenium, and di-vacancy of selenium.
Roberts et al. developed DefectSegNet to automatically identify defects in transmission and STEM images of steel including dislocations, precipitates, and voids [253]. They provide detailed information on the design, training, and evaluation of the model. They also compare measurements generated from the model to manual measurements performed by several different human experts, demonstrating that the measurements generated by DL are quantitatively more accurate and consistent.
Kusche et al. applied DL to localize defects in panoramic SEM images of dual-phase steel [237]. Manual thresholding was applied to identify dark defects against the brighter matrix. Regions containing defects were classified via two neural networks. The first neural network distinguished between inclusions and ductile damage in the material. The second classified the type of ductile damage (i.e., notching, martensite cracking, etc.) Each defects was also segmented via watershed algorithm to obtain detailed information on its size, position, and morphology.
Applying DL to localize defects and atomic structures is a popular area in materials science research. Thus, several other recent studies on these applications can be found in the literature [277][278][279][280].
In the above examples pixelwise DL, or classification models are combined with image analysis to distinguish individual instances of detected objects. However, when there are several adjacent objects of the same class that touch or overlap each other in the image, this approach will falsely detect them to be a single, larger object. In this case, DL models designed for detection or instance segmentation can be used to resolve overlapping instances. In one such study, Cohn and Holm applied DL for instance level segmentation of individual particles and satellites in dense powder images [254]. Segmenting each particle allows for computer vision to generate detailed size and morphology information which can be used to supplement experimental powder characterization for additive manufacturing. Additionally, overlaying the powder and satellite masks yielded the first method for quantifying the satellite content of powder samples, which cannot be measured experimentally.

Superresolution imaging and auto-tuning experimental parameters
The studies listed so far focus on automating the analysis of existing data after it has been collected experimentally. However, DL can also be applied during experiments to improve the quality of the data itself. This can reduce time for data collection or improve the amount of information captured in each image. Super-resolution and other DL techniques can also be applied in-situ to autonomously adjust experimental parameters.
Recording high-resolution electron microscope images often requires large dwell times, limiting the throughput of microscopy experiments. Additionally, during imaging, interactions between the electron beam and a microscopy sample can result in undesirable effects including charging of non-conductive samples and damaging of sensitive samples. Thus, there is interest in using DL to artificially increase the resolution of images without introducing these artifacts. One method of interest is applying generative adversarial networks (GANs) for this application.
De Haan et al. recorded SEM images of the same regions of interest in carbon samples containing gold nanoparticles at two different resolutions [281]. Lowresolution images recorded at were used as inputs to a GAN. The corresponding images with twice the resolution were used as the ground truth. After training the GAN reduced the number of undetected gaps between nanoparticles from 13.9 % to 3.7 %, indicating that super-resolution was successful. Thus, applying DL led to a four-fold reduction of the interaction time between the electron beam and the sample.
Ede and Beanland collected a dataset of STEM images of different samples [255]. Images were subsampled with spiral and 'jittered' grid masks to obtain partial images with resolutions reduced by a factor up to 100. A GAN was trained to reconstruct full images from their corresponding partial images. The results indicated that despite a significant reduction in the sampling area, this approach successfully reconstructed high resolution images with relatively small errors.
DL has also been applied to automated tip conditioning for SPM experiments. Rashidi and Wolkow trained a model to detect artifacts in SPM measurements resulting from a degredation in tip quality [282]. Using an ensemble of convolutional neural networks resulted in 99 % accuracy. After detecting that a tip has degraded, the SPM was configured to automatically recondition the tip in-situ until the network indicated that the atomic sharpness of the tip has been restored. Monitoring and reconditioning the tip is the most time and labor intensive part of conducting SPM experiments. Thus, automating this process through DL can increase the throughput and decrease the cost of collecting data through SPM.
In addition to materials characterization, DL can be applied to autonomously adjust parameters during manufacturing. Scime et al. mounted a camera to multiple 3D printers [283]. Images of the build plate were recorded throughout the printing process. A dynamic segmentation convolutional neural network was trained to recognize defects such as recoater streaking, incomplete spreading, spatter, porosity, and others. The trained model achieved high performance and was transferable to multiple printers from three different methods of additive manufacturing. This work is the first step to enabling smart additive manufacturing machines that can correct defects and adjust parameters during printing.
There is also growing interest in establishing instruments and laboratories for autonomous experimentation. Eppel et al. trained multiple models to detect chemicals, materials, and transparent vessels in a chemistry lab setting [284]. This study provides a rigorous analysis of several different approaches for scene understanding. Models were trained to characterize laboratory scenes with different methods including semantic segmentation and instance segmentation, both with and without overlapping instances. The models successfully detected individual vessels and materials in a variety of settings. Finer-grained understanding of the contents of vessels, such as segmentation of individual phases in multi-phase systems, was limited, outlining the path for future work in this area. The results is an important step towards the development of automated experimentation for laboratory scale experiments.

Microstructure representation learning
Materials microstructure is often represented in the form of multi-phase highdimensional 2D/3D images and thus can readily leverage image-based DL methods to learn robust, low-dimensional microstructure representations, which can subsequently be used for building predictive and generative models to learn forward and inverse structure-property linkages, which are typically studied across different length scales (multi-scale modeling). In this context, homogenization and localization refer to transfer of information from lower length scales to higher length scales and vice-versa. DL using customized CNNs has been used both for homogenization, i.e., predicting the macroscale property of a material given its microstructure information [259,261,285], as well as for localization, i.e., predicting the strain distribution across a given microstructure for a loading condition [262].
Transfer learning has also been widely used for analyzing materials microstructure images, and methods for improving the use of transfer learning to materials science applications is still an area of active research. Goetz et al. investigated the use of unsupervised domain adaptation as an alternative to simply fine-tuning a pre-trained model [286]. In this technique a model is first trained on a labeled dataset in the source domain. Next, a discriminator model is used to train the model to generate features that are domain-agnostic. Comapared to simple fine-tuning, unsupervised domain adaptation improved the performance of classification and segmentation neural networks on materials science datasets. However, it was determined that the highest performance was achieved when the source domain was more visually similar to the target (for example, using a different set of microstructural images instead of ImageNet.) This highlights the utility of establishing large, publicly available datasets of annotated images in materials science.
Kitaraha and Holm used the output of an intermediate layer of a pre-trained convolutional neural network as a feature representation for images of steel surface defects and Inconnel fracture surfaces [287]. Images were classified by defect type or fracture surface orientation, respectively, using unsupervised DL. Even though no labeled data was used for training the neural network or the unsupervised classifier, the model found natural decision boundaries that achieved classification performance of 98 % and 88 % for the defect classes and fracture surface orientations, respectively. Visualization of the representations through principal component analysis (PCA) and t-distributed stochastic neighborhood embedding (t-SNE) provided qualitative insights into the representations. Though detailed physical interpretation of the representations is still a distant goal, this study provides tools for investigating patterns in visual signals contained in image-based datasets in materials science.
Larmuseau et al. investigated the use of triplet networks to obtain consistent representations for visually similar images of materials [288]. Triplet networks are trained with three images at a time. The first image, the reference, is classified by the network. The second image, called the positive, is another image with the same class label. The last image, called the negative, is an image from a separate class. During training the loss function includes errors in prediction of the class of the reference image, the difference in representations of the reference and positive images, and the similarity in representations of the reference and negative images. This process allows the network to learn representations that are consistent for images in the same class while distinguishing images from different classes. The triple network outperformed an ordinary convolutional neural network trained for image classification on the same dataset.
In addition to investigating representations used to analyze existing images, DL can be applied to generate synthetic images of materials systems. Generative Adversarial Networks (GANs) are currently the predominant method for synthetic microstructure generation. GANs consist of a generator, which create a synthetic microstructure image, and a discriminator, which attempts to predict if a given input image is real or synthetic. With careful application, GANs can be used as a powerful tool for microstructure representation learning and design.
Yang and Li et al. [263,289] developed a GAN-based model for learning a low-dimensional embedding of microstructures, which could then be easily sampled and used with the generator of the GAN model to generate realistic, statistically similar microstructure images, thus enabling microstructural materials design. The model was able to capture complex, non-linear microstructure characteristics and learn the mapping between the latent design variables and microstructures. In order to close the loop, the method was combined with a Bayesian optimization approach to design microstructures with optimal optical absorption performance. The discovered microstructures were found to have up to 17 % better property than randomly sampled microstructures. The unique architecture of their GAN model also facilitated generator scalability to generate arbitrary sized microstructure images and discriminator transferability to build structure-property prediction models. Yang et al. [264] recently combined GANs with MDNs (mixture density networks) to enable inverse modeling in microstructural materials design, i.e., generate the microstructure for a given desired property.
Hsu et al. constructed a GAN to generate 3D synthetic solid oxide fuel cell microstructures [290]. These microstructures were compared to other synthetic microstructures generated by DREAM.3D as well as experimentally observed microstructures measured via sectioning and imaging with PFIB-SEM. Synthetic microstructures generated from the GAN were observed to qualitatively show better agreement to the experimental microstructures than the DREAM.3D microstructures, as evidenced by the more realistic phase connectivity and lower amount of agglomeration of solid phases. Additionally, a statistical analysis of various features such as volume fraction, particle size, and several other quantities demonstrated that the GAN microstructures were quantitatively more similar to the real microstructures than the DREAM.3D microstructures.
In a similar study, Chun et al. generated synthetic microstructures of high energy materials using a GAN [291]. Once again, a synthetic microstructure generated via GAN showed better qualitative visual similarity to an experimentally observed microstructure compared to a synthetic microstructure generated via a transfer learning approach, with sharper phase boundaries and fewer computational artifacts. Additionally, a statistical analysis of the void size, aspect ratio, and orientation distributions indicated that the GAN produced microstructures that were quantitatively more similar to real materials.
Applications of DL to microstructure representation learning can help researchers improve the performance of predictive models used for the applications listed above. Additionally, using generative models can generate more realistic simulated microstructures. This can help researchers develop more accurate models for predicting material properties and performance without needing to actually synthesize and process these materials, significantly increasing the throughput of materials selection and screening experiments.

Mesoscale modeling applications
In addition to image-based characterization, deep learning methods are increasingly used in mesoscale modeling. Dai et al. [292] trained a GNN successfully trained to predict magnetostriction in a wide range of synthetic polycrystalline systems with around 10 % prediction error. The microstructure is represented by a graph where each node correspond to a single grain, and the edges between nodes indicate an interface between neighboring grains. Five node features (3 Euler angles, volume, and number of neighbors) were associated with each grain. The GNN was able to outperform other machine learning approaches for property prediction of polycrystalline materials by accounting for interactions between neighboring grains.
Similarly, Cohn and Holm present preliminary work applying GNNs to predict the occurrence of abnormal grain growth (AGG) in Monte Carlo simulations of microstructure evolution [293]. AGG appears to be stochastic, making it notoriously difficult to predict, control, and even observe experimentally in some materials. AGG has been reproduced in Monte Carlo simulations of material systems, but model that can to predict which initial microstructures will undergo AGG has not been established before. A dataset of Monte Carlo simulations was created using SPPARKS [294,295]. A microstructure GNN was trained to predict AGG in individual simulations, with 75 % classification accuracy. In comparison, an image-based only achieved 60 % accuracy. The GNN also provided physical insight to understanding AGG and indicated that only 2 neighborhood shells are needed to achieve the maximum performance achieved in the study. These early results motivate additional work on applying GNNs to predict the occurence in both simulated and real materials during processing.

Natural language processing
Most of existing knowledge in the materials domain is currently unavailable as structured information and only exists as unstructured text, tables or images in various publications. There exists a great opportunity to use natural language processing (NLP) techniques to convert text to structured data or to directly learn and make inferences from text information. However, as a relatively new field within materials science, many challenges remain unsolved in this domain, such as how to resolve dependencies between words and phrases across multiple sentences and paragraphs.

Data sets for NLP
Data sets relevant to natural language processing include peer-reviewed journal articles, articles published on preprint servers such as arXiv or ChemRxiv, patents, and online material such as Wikipedia. Unfortunately, being able to access or use most such data sets remains difficult. Peer-reviewed journal articles are typically subject to copyright restrictions and thus difficult to obtain, especially in the large numbers required for machine learning. Many publishers now offer text and data mining (TDM) agreements that can be signed online, and which allow at least a limited, restricted amount of work to be performed. However, gaining access to the full text of a large number of publications still typically requires strict and dedicated agreements with each publisher. The major advantage of working with publishers is that they have often already converted the articles from a document format such as PDF into an easy-to-parse format such as HyperText Markup Language (HTML). In contrast, articles on preprint servers and patents are typically available with fewer restrictions, but are typically available only as PDF files. Currently, it remains difficult to properly parse text from PDF files in a reliable manner, even when the text is embedded in the PDF. Therefore, new tools that can easily and automatically convert such content into well-structured HTML format with few residual errors would likely have a major impact on the field. Finally, online sources of information such as Wikipedia can serve as another type of data source, however often such online sources are more difficult to verify in terms of accuracy and also do not contain as much domain-specific information as the research literature.

Software libraries for NLP
Applying NLP to a raw data set involves multiple steps, including retrieving the data, various forms of "pre-processing" (sentence and word tokenization, word stemming and lemmatization, featurization such as word vectors or part of speech tagging), and finally machine learning for information extraction (e.g., named entity recognition, entity relationship modeling, question and answer, or others). There exist multiple software libraries to aid in materials NLP, as described in Table 5. We note that although many of these steps can in theory be performed by general-purpose NLP libraries such as NLTK [296], SpaCy [297], or AllenNLP [298], the specialized nature of chemistry and materials science text (including the presence of complex chemical formulas) often leads to errors. For example, researchers have developed specialized codes to perform pre-processing that better detect chemical formulas (and not split them into separate tokens or apply stemming/lemmatization to them) and scientific phrases and notation such as oxidation states or symbols for physical units. Similarly, chemistryspecific codes for extracting entities are better at extracting the names of chemical elements (e.g., recognizing that "He" likely represents helium and not a male pronoun) and abbreviations for chemical formulas. Finally, word embeddings that convert words such as "manganese" into numerical vectors for further data mining are more informative when trained specifically on materials science text versus more generic texts, even when the latter data sets are larger [299]. Thus, domain-specific tools for NLP are required in nearly all aspects of the pipeline. The main exception is that the architecture of the specific neural network models used for information extraction (e.g., LSTM, BERT, or architectures used to generate word embeddings such as word2vec or GloVe) are typically not modified specifically for the materials domain. Thus, much of the materials and chemistry-centric work currently regards data retrieval and appropriate preprocessing. A longer discussion of this topic, with specific examples, can be found in refs. [300,301].

Applications
NLP methods for materials have been applied for information extraction and search (particularly as applied to synthesis prediction) as well as materials discovery. As the domain is rapidly growing, we suggest dedicated reviews on this topic by Olivetti et al. [301] and Kononova et al. [300] for more information.
One of the major uses of NLP methods is to extract data sets from text in published studies. Conventionally, such data sets required manual entry of data sets by researchers combing the literature, a laborious and time-consuming process. Recently, software tools such as ChemDataExtractor [303] and other methods [312] based on more conventional machine learning and rule-based approaches have enabled automated or semi-automated extraction of data sets such as Curie and Néel magnetic phase transition temperatures [313], battery properties [314], UV-vis spectra [315], and surface and pore characteristics of Left: network for training word embeddings for natural language processing application. A one-hot encoded vector at left represents each distinct word in the corpus; the role of a hidden layer is to predict the probability of neighboring words in the corpus. This network structure trains a relatively small hidden layer of 100 to 200 neurons to contain information on the context of words in the entire corpus, with the result that similar words end up with similar hidden layer weights (word embeddings). Such word embeddings can be used to transform textual words into numerical vectors useful for a variety of applications. Right: projection of word embeddings for various materials science words, as trained on a corpus scientific abstracts, into two dimensions using principle components analysis. Without any explicit training, the word embeddings naturally preserve relationships between chemical formulas, their common oxides, and their ground state structures. [Reprinted according to the terms of the CC-BY license ref. [299]] metal organic frameworks [316]. In the past few years, DL approaches such as LSTMs and transformer-based models have been employed to extract various categories of information [307], and in particular materials synthesis information [302,308,317] from text sources. Such data has been used to predict synthesis maps for titania nanotubes [310], various binary and ternary oxides [318], and perovskites [319].
Databases based on natural language processing have also been used to train machine learning models to identify materials with useful functional properties, such as the recent discovery of the large magnetocaloric properties of HoBe 2 [320]. Similarly, Cooper et al. [321] demonstrated a "design to device approach" for designing dye-sensitized solar sells that are co-sensitized with two dyes [321]. This study used automated text mining to compile a list of candidate dyes for the application along with measured properties such as maximum absorption wavelengths and extinction coefficients. The resulting list of 9431 dyes extracted from the literature were downselected to 309 candidates using a variety of criteria such as molecular structure and ability to absorb in the solar spectrum. These candidates were evaluated for suitable combinations for co-sensitization, yielding 33 dyes that were further downselected using density functional theory calculations and experimental constraints. The resulting 5 dyes were evaluated experimentally, both individually and in combinations, resulting in a combination of dyes that not only outperformed any of the individual dyes but demonstrated performance comparable to an existing standard material. This study demonstrates the possibility of using literature-based extraction to identify materials candidates for new applications from the vast body of published work, which may have never tested those materials for the desired application.
It is even possible that natural language processing can directly make materials predictions without the use of intermediary models. In a study reported by Tshitoyan et al. [299] (as shown in Fig. 5), word embeddings (i.e., numerical vectors representing distinct words) trained on materials science literature could directly predict materials applications through a simple dot product between the trained embedding for a composition word (such as PbTe) and an application words (such as thermoelectrics). The researchers demonstrated that such an approach, if applied in the past using historical data, may have subsequently predicted many recently reported thermoelectric materials; they also presented a list of potentially interesting thermoelectric compositions using the known literature at the time. Since then, several of these predictions have since been tested either computationally [322][323][324][325][326][327] or experimentally [328] as potential thermoelectrics. Recently, such approaches have also been applied to search for understudied areas of metallocene catalysis [329], although challenges still remain in such direct approaches to materials prediction.

Uncertainty quantification
Uncertainty quantification (UQ) is an essential step in the evaluation of the robustness of DL. Specifically, DL models have been criticized for lack of robustness, interpretability, and reliability and the addition of carefully quantified uncertainties would go a long way towards addressing such shortcomings. While most of the focus in the DL field currently goes into developing new algorithms or training networks to high accuracy, there is an increasing attention to UQ, as exemplified by the detailed review of Abdar et al. [330]. However, determining the uncertainty associated to DL predictions is still a challenging and far from a completely solved problem.
The main drawback to estimating UQ when performing DL is the fact that most of the currently available UQ implementations do not work for arbitrary, off-the-shelf models, without retraining or redesigning. Bayesian NNs are the exception; however, they require significant modifications to the training procedure, are computationally expensive compared to non-Bayesian NNs and become increasingly inefficient the larger the data size gets. A significant fraction of the current research in DL UQ focuses exactly on such an issue: how to evaluate uncertainty without requiring computationally expensive retraining or DL code modifications. An example of such an effort is the work of Mi et al [331], where three scalable methods are explored, to evaluate the variance of output from trained NN, without requiring any amount of re-training. Another example is Teye, Azizpour and Smith's exploration of the use of batch normalization as a way to approximate inference in Bayesian models [332].
Before reviewing the most common methods used to evaluate uncertainty in DL, let us briefly point out key reasons to add UQ to DL modeling. Reaching high accuracy when training DL models implicitly assumes the availability of a sufficiently large and diverse training dataset. Unfortunately, this rarely occurs in material discovery applications [333]. ML/DL models are prone to perform poorly on extrapolation [334] . They also find extremely difficult to recognize ambiguous samples [335]. In general, determining the amount of data necessary to train a DL to achieve the required accuracy is a challenging problem. Careful evaluation of the uncertainty associated with DL predictions would not only increase reliability in predicted results but would also provide guidance on estimating the needed training data set size as well as suggesting what new data should be added to reach the target accuracy (uncertainty-guided decision). Zhang, Kailkhura, and Han's work emphasizes how including a UQ-motivated reject option into the DL model results in substantial improvements in the performance of the remaining material data [333]. Such a reject option is associated to the detection of out-of-distribution samples, which is only possible through UQ analysis of the predicted results.
Two different uncertainty types are associated with each ML prediction: epistemic uncertainty and aleatory uncertainty. Epistemic uncertainty is related to insufficient training data in part of the input domain. As mentioned above, while DL are very effective at interpolation tasks, they cannot extrapolate, and, therefore, it's vital to quantify the lack of accuracy due to localized, insufficient training data. The aleatory uncertainty, instead, is related to parameters not included in the model. It relates to the possibility of training on data that our DL perceives as very similar but that are associated to different outputs because of missing features in the model. Ideally, we would like UQ methodologies able to distinguish, and separately quantify, both types of uncertainties.
The most common approaches to evaluate uncertainty using DL are Dropout methods, Deep Ensemble methods, Quantile regression and Gaussian Processes. Dropout methods are commonly used to avoid over-fitting. In this type of approach, network nodes are disabled randomly during training, resulting in evaluation of a different subset of the network at each training step. When a similar randomization procedure is applied to the prediction procedure as well, the methodology becomes Monte-Carlo dropout [336]. Repeating such randomization multiple times produces a distribution over the outputs, from which mean and variance are determined for each prediction. Another example of using a dropout approach to approximate Bayesian inference in deep Gaussian processes is the work of Gal and Ghahramani [337].
Deep ensemble methodologies [338][339][340][341] combine deep learning modelling with ensemble learning. Ensemble methods utilize multiple models and different random initializations to improve predictability. Because of the multiple predictions, statistical distributions of the outputs are generated. Combining such results into a Gaussian distribution, confidence intervals are obtained through variance evaluation. Such a multi-model strategy allows the evaluation of aleatory uncertainty when sufficient training data are provided. For areas without sufficient data, the predicted mean and variance will not be accurate, but the expectation is that a very large variance should be estimated, clearly indicating non-trustable predictions. Monte-Carlo Dropout and Deep Ensembles approaches can be combined to further improve confidence in the predicted outputs.
Quantile regression can be utilized with DL [342]. In this approach, the loss function is used in a way that allows to predict for the chosen quantile a (between 0 and 1). A choice of a = 0.5 corresponds to evaluating the Mean Absolute Error (MAE) and predicting the median of the distribution. Predicting for two more quantile values (amin and amax) determines confidence intervals of width amax -amin. For instance, predicting for amin = 0.1 and amax = 0.8 produces confidence intervals covering 70 % of the population. The largest drawback of using quantile to estimate prediction intervals is the need to run the model 3 times, one for each quantile needed. However, a recent implementation in TensorFlow allows to simultaneously obtain multiple quantiles in one run.
Lastly, Gaussian Processes (GP) can be used within a DL approach as well and have the side benefit of providing UQ information at no extra cost. Gaussian processes are a family of infinite-dimensional multivariate Gaussian distributions completely specified by a mean function and a flexible kernel function (prior distribution). By optimizing such functions to fit the training data, the posterior distribution is determined, which is later used to predict outputs for inputs not included in the training set. Because the prior is a Gaussian process, the posterior distribution is Gaussian as well [343], thus providing mean and variance information for each predicted data. However, in practice standard kernels under-perform [344]. In 2016, Wilson et al. [345] suggested to process inputs through a neural network prior to a Gaussian process model. This allowed to extract high-level patterns and features, however required careful design and optimization. In general, Deep Gaussian processes improve the performance of Gaussian processes by mapping the inputs through multiple Gaussian process 'layers'. Several groups have followed this avenue and further perfected such an approach ( [344] and references within). A common drawback of Bayesian methods is a prohibitive computational cost if dealing with large datasets [337].

Limitations and challenges
Although DL methods have various fascinating opportunities for materials design, they have several limitations and there is much room to improve. Reliability and quality assessment of datasets used in DL tasks are challenging because there is either a lack of a ground truth data, or there are not enough metrics for a global comparison, or datasets using similar or identical set-ups may not be reproducible [346]. This poses an important challenge on relying upon DL based prediction.
Material representations based on chemical formula alone by definition do not consider structure, which on the one hand makes them more amenable to work for new compounds for which structure information may not be available, but on the other hand makes it impossible for them to capture phenomena such as phase transitions. Properties of materials depend sensitively on structure to the extent that their properties can be quite opposite depending on the atomic arrangement, like diamond (hard, wide-band-gap insulator) and graphite (soft, semi-metal). It is thus not a surprise that chemical formula based methods may not be adequate in some cases [169].
Atomistic graph based predictions, though considered a full atomistic description, are tested on bulk materials only and not for defective systems or for multi-dimensional phases space exploration such as using genetic algorithms. In general, this underscores that the input features must be predictive for the output labels and not be missing some key information. Although atomistic graph neural network models such as atomistic line graph neural network (ALIGNN) have achieved remarkable accuracy compared to previous atomistic based models, the model errors still need to be further brought down to reach something resembling deep-learning 'chemical-accuracies. ' In terms of images and spectra, the experimental data are too noisy most of the time and require much manipulation before applying DL, while theory based simulated data work, but being noise-free do not capture realistic scenarios [235].
Uncertainty quantification for deep learning for materials science is important and yet only a few works have been done in this field. To alleviate the black box [38] nature of the DL methods, package such as GNNExplainer [347] has been tried in the materials context. Such attempts at greater interpretability will be important moving forward to gain the trust of the materials community.
While training-validation-test split strategies were primarily designed in DL for image classification tasks with a certain number of classes, the same for regression models in materials science may not be the best approach. This is because it is possible that that during the training the model is seeing a material very similar to the test set material and in reality it is difficult to generalize the model. Best practices need to be developed for data split, normalization and augmentation to avoid such issues [334].
Finally, we note an important technological challenge is to make a "closedloop" autonomous materials design and synthesis process [348,349] that can include both machine learning and experimental components in a "self-driving laboratory" [350]. For an overview of early proof of principle attempts see [351]. For example, in an autonomous synthesis experiment the oxidation state of copper (and therefore the oxide phase) was varied in a sample of copper oxide by automatically flowing more oxidizing or more reducing gas over the sample and monitoring the charge state of the copper using XANES. An algorithmic decision policy was then used to automatically change the gas composition for a subsequent experiment based on the prior experiments, with no human in the loop, in such a way as to autonomously move towards a target copper oxidation state [352]. This is a simple proof of principle experiment that gives just a glimpse of what is possible moving forward.