Artificial intelligence for search and discovery of quantum materials

Artificial intelligence and machine learning are becoming indispensable tools in many areas of physics, including astrophysics, particle physics, and climate science. In the arena of quantum materials, the rise of new experimental and computational techniques has increased the volume and the speed with which data are collected, and artificial intelligence is poised to impact the exploration of new materials such as superconductors, spin liquids, and topological insulators. This review outlines how the use of data-driven approaches is changing the landscape of quantum materials research. From rapid construction and analysis of computational and experimental databases to implementing physical models as pathfinding guidelines for autonomous experiments, we show that artificial intelligence is already well on its way to becoming the lynchpin in the search and discovery of quantum materials. Quantum materials host many exotic properties, which might be utilized for new electronic devices. Here, artificial intelligence for the discovery of quantum materials is discussed, covering both materials and property prediction, and high-throughput synthesis.

which render them so difficult to decipher. The notorious case in point is the superconductivity in cuprate materials, whose fundamental mechanism still eludes us after three decades of concerted research efforts 5,6,19 . For some strongly correlated quantum materials such as spin liquids even the theoretical language necessary to adequately describe them is yet to be fully developed. When there is no guidance from rigorous theory, we are often left with serendipity to work with while exploring new compounds. There is, however, a glimmer of hope in an unusual form: research tools using Artificial Intelligence (AI) are coming to the rescue.
The explosive increase in computational power and volume of accessible data is fueling the growth of AI as it transforms many aspects of our everyday lives with emerging technologies such as self-driving vehicles and Internet of Things devices. AI, especially its subfield of Machine Learning (ML), is also becoming a breadand-butter analysis tool, with important applications in virtually all sciences, from cancer research 20 to astrophysics 21 . ML has already made its mark on condensed matter and materials physics by providing a robust platform for extracting a parsimonious description of a materials system from experimental and computational data [22][23][24][25][26][27][28] . When the task at hand is the discovery of new quantum materials, then one obvious strategy is to unleash ML on information obtained from existing compounds to make predictions on compositions and collective behaviors of their atomic constituents.
In this review, we discuss the ongoing research spearheading the effort to turn AI into a ubiquitous and indispensable tool for the study of quantum materials. The complexities and rich physics of these materials make them simultaneously one of the most promising targets and a particularly challenging subject for the emerging AI-enabled methods. Only exhaustive highdimensional data encompassing a diverse range of physical properties can help scientists develop a comprehensive understanding of materials such as unconventional superconductors, with complex physics often rooted in strong electron-electron interactions, or graphene superlattices, hosting a multitude of competing phases 29 . Thus, data need to be fused from disparate sources with vastly different modalities and veracities: theoretical models and first-principles computational tools, various experimental measurements, and results from previous studies. This review provides a vision for the future where ML algorithms are commonly proliferated for the creation, analysis, and visualization of such high-dimensional heterogeneous data collections. The focus of the article is superconductivity, which is one of the most intensely pursued topics in all of condensed matter physics: the continued research effort, combined with its relatively long history, has led to significant accumulation of computational and experimental data-a prerequisite for the use of ML tools. However, the tools being developed to study superconductors have the potential to dramatically increase the efficiency of the experimental and computational studies of all quantum materials, representing a veritable breakthrough in the exploration of these complex physical systems. (For some common AI terminology and recent examples of ML applied to quantum materials problems see Fig. 1 and Box 1.) Still, this vision is not without its share of potential pitfalls. In the sections below, we discuss the benefits and challenges associated with AI-based approaches to novel quantum materials.

AI for computational quantum materials
In application after application, it has been demonstrated that ML can turn complex multidimensional data into actionable knowledge (see, for example, refs. [30][31][32] ). The success of ML methods, however, depends crucially on access to prodigious amounts of high-quality data. Not surprisingly, in physical sciences, these algorithms were first adopted in fields with abundant and highly standardized measurements. In fact, the vast amounts of data generated in particle physics have for decades required the use of automated algorithms for data analysis 21,33 . The importance of ML in this area is exemplified by the central role it played in the discovery of the Higgs boson at the Large Hadron Collider at CERN 33 .
When it comes to materials data, sophisticated computational first-principles tools are the order of the day, and enormous databases with millions of computed property entries are now readily available at everyone's fingertips [34][35][36][37][38] . In the realm of quantum materials, however, the situation is less auspicious. The most widely used computational method for simulating materials properties is Density Functional Theory (DFT), which has served as the backbone of materials by design in areas such as energy and magnetic materials (see, for example, refs. [39][40][41] ). Yet, DFT is built on several commonly used but essentially uncontrolled approximations. While the trends of the physical properties (e.g., as a function of composition) calculated by conventional DFT are generally correct, the veracity of the calculated parameters depends significantly on computational details. For quantum materials, calculated properties can become even more tenuous and inaccurate, due for example, to self-interaction error originating in electron-electron interactions, spin-orbit coupling, magnetism, etc. This severely limits the utility of conventional DFT for entire classes of quantum materials. A very promising emerging research area addresses this challenge by using ML methods to systematically improve the approximations made in DFT calculations [42][43][44][45] . AI methods are also being used to accelerate high-level methods such as Quantum Monte Carlo (QMC) 46,47 and Dynamical Mean Field Theory (DMFT) 48 , which are more accurate than DFT but can be orders-of-magnitude slower and higher in computational cost.
These exciting developments notwithstanding, for now, conventional DFT remains the workhorse of computational materials science. It has been shown that the calculations can be combined with ML algorithms to efficiently and accurately predict materials properties such as melting temperature, bandgap, shear modulus, and heat capacity [49][50][51] . DFT methods were also at the center of some of the early data-driven approaches to quantum materials, especially in the search for new superconductors. The discovery of superconducting materials has always been driven by serendipity and individual researchers' intuition. However, the complex physics of cuprates and other strongly correlated materials (for example organic charge-transfer salts, exhibiting a variety of exotic phases: unconventional superconductivity, Mott insulator, spin liquid, etc. 52 ) has forced scientists to explore more advanced datacentric approaches. In ref. 53 , a series of DFT calculations of the band structure of hole-doped cuprates were compared, in order to identify physical parameters governing superconductivity in these materials. The authors of ref. 54 went further and proposed several distinct characteristics of the band structure of cuprates as key precursors of their superconductivity. These characteristics were then used as criteria to filter through the calculated electronic structures of all materials in Inorganic Crystal Structure Database (ICSD), containing hundreds of thousands of existing crystal structures, and over 100 materials were identified as potential high-temperature superconductors. A similar approach was used recently in ref. 55 , where p-terphenyl-an organic material that shows hints of a possible superconducting transition at ≈123 K when doped with potassium-was used as a prototype. A database containing the electronic structure of more than 10,000 organic crystals was surveyed for compounds with a similar density of states (DOS) 56 , and as a result, 15 compounds were proposed as candidate superconductors.
The studies mentioned above combine DFT calculations with heuristic criteria to search for novel superconductors. Such criteria unavoidably suffer from various biases and can be difficult to generalize. A much more rigorous approach is offered by ML methods, which provide a systematic way to extract important predictors of materials properties from complex highdimensional data. As an example, an investigation in 2015 used structural and electronic properties data from the AFLOW Online Repositories to define and calculate various fingerprints that encode materials' electronic band structures, as well as crystallographic and constitutional information 57 . These fingerprints were then used to create both regression and classification ML models for the critical temperature of several hundred superconductors. The classification model separating the materials into two groups based on their critical temperature showed good predictive performance. Such models can be used to screen hundreds of thousands of existing and potential materials stored in computational databases.
In addition to directly predicting a property of interest (such as T c ) from existing DFT data, ML can be used to accelerate the intermediate computational steps. For example, in conventional superconductors, the coupling between the electrons and the lattice vibrations creates an effective attractive electron-electron interaction leading to Cooper pair formation. However, an ab initio calculation of electron-phonon coupling strength requires knowledge of properties such as the phonon DOS, which are computationally expensive, especially for complex materials. In a recent publication, it was shown that a neural network model can be trained to create high-fidelity phonon DOS predictions. The authors utilized phonon DOS calculations of~1500 crystalline solids as a training dataset 58 . The predictions of the model were able to capture the main features of phonon DOS, even for crystalline solids with elements not present in the training set. Thus, by reducing the computational cost of ab initio methods, the use of ML can significantly enhance the throughput of computational materials screening.
In yet another example of incorporating AI into DFT for superconductivity, ML feature selection methods have been utilized to derive a simple analytical expression improving the Allen-Dynes approximation 59 . The Allen-Dynes analytical formula is based on the Eliashberg-Migdal theory and is commonly used to predict the critical temperature of electron-phonon paired superconductors, reducing the number of required DFT calculations. The increase of the accuracy achieved in the new formula

Box 1 | Key terms
Artificial intelligence: "Intelligence" demonstrated by machines able to perceive their environment and take autonomous actions to achieve a goal. Machine learning: Branch of AI, a method for automated model building. Design of systems that can learn from data and make decisions with minimal human intervention. Supervised learning: Subfield of ML, aimed at creating predictive models based on both input and output (target) data. Subdivides into: Classification: Modeling target data with discrete categories. Regression: Modeling continuous target data. Commonly used algorithms include Linear Regression, Support Vector Machines, Decision Trees, Gaussian Processes, Neural Networks. Unsupervised learning: Subfield of ML, aimed at discovering internal representations from input data only. The two most widely used subfields are: Dimensionality reduction: Creating low-dimensional projections of high-dimensional data. Clustering: Finding groups of related data points. Commonly used algorithms include K-means Clustering, Hierarchical Clustering, Hidden Markov Models. Active learning (optimal experimental design): Subfield of ML, designing algorithms able to interactively query an information source to obtain new outputs. Query strategies are typically some mix of exploration (aiming to maximize new information) and exploitation (aiming to optimize a property).
can improve the computational pipeline, which in turn can lead to the discovery of novel superconductors faster.
In contrast to superconductors, the concept of materials with intrinsic topologically nontrivial states is relatively new 2,3 . The current flurry of activities was initially spearheaded by a theoretical prediction 60 and the subsequent experimental confirmation of the existence of topological insulators 61,62 -materials with insulating bulk but protected metallic surface states. Despite the significant efforts in the last decade, however, only a handful of distinct topological insulators, as well as other topological materials such as Dirac and Weyl semimetals 4,63-65 are known, and given that many of them are difficult to synthesize and/or suffer from deleterious properties-such as the presence of bulk trivial states undermining the surface states, the community is always on the lookout for new topological materials.
To address this issue, several groups have developed algorithms such as symmetry indicators and spin-orbit spillage that use materials chemistry and symmetry in combination with electronic structures to calculate its topological properties, opening the door to automated screening for new topological insulators and semimetals [66][67][68] . Based on these ideas, several searches through ICSD generated extensive lists of candidates corresponding to various types of topological materials [69][70][71][72][73] . Such developments make the field a fertile ground for the application of ML methods (for an example see Fig. 2). A very recent work constructed a gradient boosting model (a particular type of powerful supervised learning algorithm) that can predict the topology of a given known material based only on "coarse-grained" chemical composition and crystal symmetry predictors with an accuracy of 90% 74 . Similar to superconductivity, such models can be used to accelerate the search for novel materials by providing fast and efficient means to predict the possible topological nature of a given candidate.
Another class of quantum materials systems is 2D materials 1 . Since the breakthrough discovery of graphene in 2004, many other 2D materials families (such as monolayers of hexagonal boron nitride, silicene, germanene, stanene, phosphorene, and transition metal dichalcogenides) have been synthesized 1 . They have been the focus of a concerted investigation, both due to their rich potential in technologies such as electronics, sensing, and energy storage, and because they offer an entirely new avenue to explore the interplay between particle-particle interactions, band structure, and constrained dimensionality (leading to effects such as the long-sought Wigner crystal 75 ). The search for new 2D materials and a systematic comparison of their properties is still in its infancy. High-throughput DFT has been used in the last several years to compile publicly available databases of potential 2D materials [76][77][78][79] . In ref. 80 , a JARVIS-DFT dataset containing results of 2D and 3D DFT calculations was used to develop handcrafted structural descriptors as inputs into gradient boosting models to predict properties including exfoliation energies, formation energies, and bandgaps. From the synthesis point of view, the exfoliation energy is particularly crucial for 2D materials design, yet its DFT is prohibitively expensive. These models were then used to discover exfoliable materials satisfying specific property requirements. In ref. 81 , the Computational 2D Materials Database (C2DB) was used to create an ML model (again based on gradient boosting) utilizing composition and structural predictors to classify potential 2D materials as having low, medium, or high stability. The model was used to discover potential 2D materials suitable for photoelectrocatalytic water splitting.
Beyond monolayers, there is a vast configuration space of possible van der Waals heterostructures, which can be formed by stacking different 2D layers. Such hybrid structures add additional degrees of freedom to their functionalities, creating 2D and 3D materials with untold tunability. Because DFT is too timeconsuming to explore more than a small fraction of all possible combinations, incorporating ML in the process provides a practical alternative. Recent work has trained several ML models for predicting the interlayer distance and bandgap of bilayer heterostructures 82 . The models showed good accuracy on both tasks and were used to predict the properties of nearly 1500 hypothetical bilayer structures based on less than 300 DFT calculations. This underscores the promise of combining DFT and ML for rapid computational screening to identify new hybrid heterostructures with interesting and desirable properties.
Applying ML to experimental databases: making predictions from known materials With the ability to quickly recognize patterns in large collections of quantitative information, applications of AI approaches in  table trend showing the percentage chance (color coded) that a compound containing any given element can be topologically nontrivial based on spin-orbit spillage approach, based on spillage data for 4835 nonmagnetic materials (reproduced from ref. 68 under the terms of the Creative Commons CC BY license). The spillage is a machine-learnable quantity, and an accurate classification model predicting it (utilizing gradient boosting decision trees) was developed in ref. 73 . experimental materials data are becoming just as prevalent. One natural task would be to apply them to databases amassing experimentally known compounds and to build models for making predictions. In the study of superconductors-a field with more than a century of extensive research history-there are large volumes of accumulated information. Sometimes even a small fraction of this data can be used to discover important trends. In an early pioneering work, Villars and Phillips demonstrated that using just three stoichiometric descriptors (associated with elemental makeups), the 60 known superconductors at the time (in the 80 s) with T c > 10 K can all be clustered in three islands 83 . Based on this observation the authors made predictions for potential high-temperature superconductors 83,84 . In another early work, Hirsch used statistical methods to look for correlations between normal state properties and T c of the metallic elements in the first six rows of the periodic table 85 .
With time, more and more of the experimental data collected by generations of scientists are becoming easily accessible to researchers. Thanks to tedious curation efforts over decades by researchers at several institutes around the world, there are large databases with compiled experimental report entries such as the Phase Equilibria Diagram and the ICSD, arguably one of the largest compendiums of experimental materials data 86 . Because of their comprehensive and exhaustive nature, such databases have become the de-facto standard-bearers for materials exploration and development, and they often serve as a starting point for building ML models (as already discussed above for use in conjunction with purely computational approaches). While these databases rarely contain functional properties, phase formation and stability as well as phase diagrams can be used as blueprints for navigating the materials exploration process. For example, a recent work used ICSD data to train a neural network to predict crystal structure information 87 . The network's activations maps were then used to group materials according to their compositional and structural similarity, providing lists of materials potentially sharing properties with known superconductors or topological insulators (see Fig. 3a) Unfortunately, when it comes to experimental databases of known quantum materials, usually there are very few entries in the datasets. In fact, one is often hard-pressed to find extensive databases of functional materials in general. A rare exception is the MatNavi database, managed by the National Institute of Materials Science (NIMS) in Japan. It is an experimental database encompassing well-curated information on materials with a variety of functional properties. Transcribing published journal results into formatted databases can be a massive undertaking, which NIMS materials data scientists have managed to do for decades. Some of the collected information is already being used for data-driven materials exploration 88,89 . The Pauling File also contains up to tens of thousands of individual composition entries (and corresponding specific physical property quantities) diligently entered over decades 90 .
This increase of the available experimental data, together with the surge of popularity of AI topics and the appearance of general-purpose ML libraries, led to a recent groundswell of activities introducing sophisticated ML methods in the study of superconductivity [91][92][93][94][95][96] . In one notable example 91 , Stanev et al. considered more than 16,000 different compositions extracted from the MatNavi SuperCon database, which contains an exhaustive list of known superconductors as well as many "closely-related" materials varying only by small changes in stoichiometry. Compared to the early exercise by Villars and Philips in the 80 s, the orders-of-magnitude increase in the number of data points has led to the possibility to create a robust ML pipeline. The regression models developed to predict the values of T c for different superconducting families used over one hundred stoichiometric descriptors, demonstrated strong predictive power and high accuracy, and offered valuable insights into the origins of superconductivity mechanisms in different materials groups (see Fig. 3b). The models also demonstrated an important limitation of ML in failing to extrapolate to materials families not included in the training set. A pipeline was then created to search for potential new superconductors among the roughly 110,000 different compositions contained in ICSD, which resulted in predictions of possible T c 's above 20 K in 35 known compounds that had previously not been tested for superconductivity.
One interesting finding from this pipeline is that most of the newly identified possible superconductors possess a flat/nearly- Fig. 3 Examples of machine-learning methods applied to experimental data. a Three clusters (shown with blue, green, and red), representing groups of related materials, containing topologically nontrivial materials. The materials representations are extracted from a neural network model predicting the Bravais lattice from composition-for details, see ref. 87 . The clusters can be used to search for new topological materials. b The measured vs predicted ln(T c ) of various superconductors based on a random forest model presented in ref. 91 . The same model can predict T c of several distinct superconducting classes (blue markers: low-T c materials; green markers: iron-based superconductors; red markers: cuprate superconductors).
flat band just below the Fermi energy, leading to an increase in the electronic DOS. Such enhanced DOS has long been considered a promising way to boost T c . However, a very recent study using experimental T c and DFT-based DOS calculations failed to discover a strong and consistent correlation between superconductivity and peaks in the electronic DOS 97 . Yet, the strongest conclusion from the latter study was on "…the restrictions that the current availability and organization of materials data place on reliable machine-learning and data-based experimentation," underscoring the need for a more systematic approach to collecting and organizing quantum materials data.
This, however, is far easier said than done: the experimental exploration of quantum materials is a remarkably diverse and distributed endeavor, and it is not uncommon for tens or even hundreds of research groups to study the same material, with each group applying its unique toolbox and interrogation techniques. Also, many experiments are extremely resource-intensive and can only produce limited datasets. Combining numerous small datasets covering the same material can lead to issues with data consistency and incomplete metadata. Furthermore, the quantum materials community has yet to fully embrace the open data paradigm, and a significant fraction of the collected data are only published in unstructured form (i.e., text and images) and are not made easily available to other researchers. (Significant fraction of the collected information-especially "negative" results-is never published. For a rare example of work reporting both negative and positive results see ref. 98 ).
Thus, if one wants to construct an experimental database of quantum materials, the researcher may have to resort to the laborious manual extraction of data points from published articles. The significant effort required by this process at least partially explains the scarcity of databases in the field. However, emerging AI-driven automatic generation of databases can provide an alternative to the slow and tedious process. One example is the recently created database of almost 40,000 Curie and Néel phase-transition temperatures of magnetic materials 99 . It is fully auto-generated and completely open-source. The database was produced using Natural Language Processing (NLP) and related ML methods, applied to the texts of published chemistry and physics articles. The method is quite precise, with the accuracy of the extracted transition temperatures reaching 82%. In a very recent extension of this work, the same authors were able to reconstruct the phase diagrams of wellknown magnetic and superconducting compounds, again using text data 100 . They also demonstrated that it is possible to predict the phase-transition temperatures of compounds not present in the database. (A yet another recent effort to organize various experimental results into a curated database focuses on digitization of plots extracted from published papers 101 .)

Extracting latent knowledge from materials characterization data
In an ironic twist, in some areas of experimental materials physics, the issue is not the scarcity but the overabundance of data. In fact, negotiating the large quantities of data churned out in realtime by modern experimental labs is a significant emerging challenge. Continual improvements in materials characterization instrumentation coupled with the ever-increasing might of readily available computers for data-acquisition and memory storage have created the capacity to collect data on an unprecedented scale and with high speed. Through advances in scanning transmission electron microscopy, it is now possible to determine atomic column positions with picometer level precision (see, for example, refs. 102,103 ). Spectroscopy and scanning probe measurements can provide detailed information about materials properties including electronic structure and symmetry of an order parameter 104,105 . Pump-probe techniques permit the study of highly excited and out-of-equilibrium states and phases, sometimes revealing hidden tendencies 106 .
With these techniques, even a single measurement of one material can generate large volumes of high-dimensional data necessitating the use of sophisticated statistical methods. In these instances, ML algorithms are implemented for uncovering physics working behind the scene, which affects the collective behavior of materials at an atomic scale. In one recent example, a neural network model was used to analyze Spectroscopic Imaging Scanning Tunneling Microscopy (SISTM) images of the CuO 2 planes, where transport takes place in high-temperature superconducting cuprates 107 . The model was designed to recognize different symmetry-breaking ordered electronic states. It was trained on a set of artificial images, each generated to represent one of four distinct states, differing by their fundamental wavevector. Various forms of heterogeneity, intrinsic disorder, and topological defects were added to these images to mimic the experimental data. The model was then applied to experimental images from carrier-doped cuprates. Analyzing the noisy and complex data, the model was able to discover the existence of a translational-symmetry-breaking ordered state. The presence of this particular ordered state has important implications for theories aiming to explain the mysterious pseudogap phase of these materials.
In another notable example, researchers applied an ML method to the Angle-Resolved Photoemission Spectroscopy (ARPES) data of optimally doped and under-doped cuprates 108 . The use of a restricted Boltzmann machine model allowed the recovery of a "hidden" feature in the spectra: prominent peak structures present both in the normal and anomalous self-energies of the singleparticle spectral function. These peaks structures cancel each other in the total self-energy and were only discovered by a model respecting physical constraints such as causality (encoded in the Kramers-Kronig relation). The use of ML allowed researchers to solve a non-linear underdetermined problem, and to obtain important information directly from experimental data, without the use of any theoretical model. The results clarified the role of the energy dissipation and quantum entanglement in the superconducting phase and provide a way to finally identify the boson responsible for the pairing in cuprates.
Beyond superconductivity, ML was recently applied to neutron scattering data of a frustrated magnet, exhibiting a complex phase diagram, and used to extract model Hamiltonians and to identify different magnetic regimes 109 . An autoencoder-dimensionality reduction architecture based on a neural network-was trained to create a compressed representation of three-dimensional diffuse scattering, over a wide range of spin Hamiltonians. The autoencoder was able to find optimal Hamiltonian parameters matching observed scattering and heat capacity signatures. The autoencoder was also able to categorize different magnetic behaviors and eliminate background noise and artifacts in raw data. Thus, ML can augment many traditional diffraction and inelastic neutron scattering data analysis tools, which are often time-consuming and error-prone.
Another very recent work reported a convolutional-neuralnetwork-based classifier designed to distinguish topological from trivial materials and trained on X-ray Absorption Spectroscopy (XAS) data 110 . The model showed high accuracy in distinguishing the different classes of materials, demonstrating the potential of ML methods in recognizing topological character embedded in complex spectral features. XAS is a widely used materials characterization technique, and the ability to decipher the topological character of material from XAS signatures can substantially simplify and expedite the experimental identification of topologically nontrivial materials.

Autonomous materials laboratories
Armed with predictions of potential quantum materials generated by theory or the ML techniques discussed above, experiments can be designed for systematic synthesis and characterization of novel compounds. However, even with the predictions providing blueprints, one often has to negotiate potentially large multidimensional parameter space as in any materials exploration and optimization task. One proven platform to effectively host such experiments is the high-throughput approach, which permits the interrogation of libraries that contain hundreds or even thousands of different compounds 111 . Over the years combinatorial approaches have been effectively used to explore new superconductors 112 and map their compositional phase diagrams (for an example demonstrating the ability of this method to discover superconducting compositions see Fig. 4a). It has also been used to optimize the synthesis of a topological Kondo insulator SmB 6 in a thin-film form 113 .
A significant hurdle in the more widespread use of highthroughput methods is the need for special characterization tools, often required for the exploration of many classes of quantum materials. For instance, for detecting topological states, one typically turns to angle-resolved photoemission spectroscopy (ARPES) to look for their signature in electronic structure 114 . Antiferromagnetism-whose role in the superconductivity of several classes of materials is hotly debated-is best probed using neutron scattering 115 . X-ray magnetic circular dichroism has been used to investigate the electronic structure of quantum spin liquid α-RuCl 3 116 These advanced techniques require unique synchrotron or neutron facilities, and their implementation in the screening of combinatorial libraries has been limited to date 117,118 The high-throughput approach generates large highdimensional datasets, requiring analysis techniques that can rapidly turn raw data into knowledge with limited or no human supervision 119 . This community adopted dimensionality reduction and data mining techniques early on 120-123 , gradually creating a diverse ML toolbox for the rapid digestion of combinatorial data 124-130 . One very common task is to quickly group (cluster) measurements from different points of a combinatorial library, and ML algorithms have been routinely applied to a large number of X-ray diffraction patterns to delineate structural phases and rapidly construct a composition-structure relationship 120,124,129,130 . The authors of ref. 124 took this idea a step further and used a comprehensive ML algorithm for onthe-fly analysis of synchrotron diffraction data from combinatorial libraries to facilitate a search for rare-earth-free permanent magnets (see Fig. 4b). Unsupervised ML for rapid data reduction is now commonly being applied to high-throughput characterization data from a variety of spectroscopic techniques, including Raman spectroscopy 131 , Time-of-Flight Secondary Ion Mass Spectrometry 132 , and X-ray photoelectron spectroscopy 133 .
This close integration of experimental tools and ML can be considered as the first step towards an emerging AI-driven paradigm for materials exploration. It relies on active learning-a branch of AI dedicated to providing a systematic and rigorous approach for identifying the best experiment to perform to achieve an objective. This can be either finding the shortest path toward a material that optimizes some desired properties or identifying a series of experiments that maximizes knowledge of the explored space. Recently, materials physicists have begun to capitalize on active learning to accelerate experimental research 134 . For example, active learning has been used to advise experimentalists on the next best experiment to perform in the search for various functional materials [135][136][137] . Another work already demonstrated the potential of these methods in the field of quantum materials: an active-learning framework designed to discover the material with the highest T c was evaluated on about 600 known superconductors 138 . The framework did significantly better than pure random guessing, highlighting the impact it can have.
The active-learning algorithms guiding the exploration can indeed decisively outperform exhaustive experimental approaches, leading to a similar amount of actionable knowledge for only a fraction of the time and resources. This has profound implications: with a significant reduction in the number of iterative experimental runs, it may now be possible to incorporate Fig. 4 Accelerating experimental discovery processes. a High-throughput methods can be used to perform validation of predicted materials. Visualization of resistivity vs temperature curves measured at different parts of a Fe-B composition spread. The horizontal range of each curve covers 2-300 K. The resistivity range for each curve is color coded (reproduced from ref. 141 under the terms of the Creative Commons CC BY license); b On-the-fly analysis of synchrotron diffraction data 124 . A snapshot photo from the experiment. The upper screen shows the scanning stage with a thin-film library inside a beamline hutch. The bottom screen is a laptop, where unsupervised ML of diffraction patterns is carried out after each measurement. The same setup has also been used to carry out active-learning-based autonomous experiments.
active-learning-guided tools at synchrotron and neutron facilities for quick screening of quantum materials libraries in closed-loop cycles.
However, the examples of active learning discussed above only provided support for an important but limited set of questions, leaving the majority of the experimental design, execution, and analysis to human experts. This situation is rapidly changing, and truly autonomous experimental systems are beginning to arrive on the scene. In one study, an autonomous system was used to control synthesis parameters to optimize carbon nanotube growth rate 139 . The Autonomous Research System (ARES) platform utilized an ML-based system linked to an automated growth reactor with in situ characterization to learn the optimum growth conditions.
Autonomous materials exploration can be particularly effective when coupled with high-throughput experimentation. In a recent study, a real-time closed-loop, autonomous system for materials exploration and optimization (CAMEO) was demonstrated on a combinatorial library. 140 The goal of CAMEO is to first map the structural phase diagram across the compositional landscape and then use the phase map as a blueprint to optimize a physical property of interest. It was used to quickly find the optimum composition of phase change memory materials with the largest bandgap difference between amorphous and crystalline states within a Ge-Sb-Te thin-film composition spread. To achieve this, analysis of ellipsometry measurements from the library and remotely-controlled synchrotron diffraction were carried out simultaneously. Combining these two sources of information, CAMEO discovered a novel composition with phase change performance superior to known materials in the field, for only about one-tenth of the time required to measure every point in the library.
Even more revolutionary AI-enabled systems, designed to help human scientists in optimizing every step of the research process, are already being considered and developed. These systems will coordinate a host of computational and experimental probes to help efficiently search a vast and mostly unexplored compositional space. The AI "brain" of the system will have access to all relevant information-results of past experiments, computations and theory, knowledge of a wide range of materials synthesis techniques, and measurement instrumentation. It will use this information to advise researchers which experiments to perform to maximize the knowledge gain and drastically accelerate progress. In its ultimate form, fully autonomous laboratories controlled by AI in command of various synthesis and characterization will orchestrate entire experimental campaigns, update their knowledge, and continue to explore until the desired goal is reached (see Fig. 5 for a schematic). These AI "scientists", dedicated to the exploration of quantum materials, can help us finally solve some of the most enduring mysteries of physics.

Outlook
As physicists focus more and more on materials with unusual properties grounded in many-body quantum mechanics, their research methods have to evolve. The ongoing revolution in AI is a great opportunity for the condensed matter physics community, and ML tools are starting to play an important role in the study of quantum materials. From accelerating the first-principles computational tools to helping analyze high-dimensional experimental data, these tools have already left their mark on the field. But even more exciting developments are gradually turning into reality. Researchers are working on automated research systems controlled by AI-guided robots. These systems will allow scientists to build penetrating multidimensional pictures of these complex materials, and at the same time accelerate the search and discovery process.
Apart from reshaping the experimental process, the use of such autonomous systems will reduce the many superfluous hurdles and dramatically lower the effort and time needed for running an experiment. This will allow scientists to focus on the most challenging and important parts of the research process, while also making science more "democratic" and equitable.