Aperiodic materials do not surrender details of their structure as readily as do their crystalline counterparts. The latest computational solution to this problem brings aspects of ‘the beautiful game’ into play.
Investigations of crystalline materials through X-ray and neutron diffraction have been a triumph of experimental science, allowing structures ranging from complex minerals to proteins and DNA to be unravelled1. But how can the structure of a material that is aperiodic — one that is non-crystalline, or cannot be crystallized — be determined? On page 655 of this issue2, Juhás et al. present an intriguing solution to this question with a novel algorithm for reconstructing three-dimensional structures from ‘pair distribution function’ (PDF) data. Aperiodic materials are among the technologically most interesting nanoscale materials currently under study, and the approach could be widely applicable.
Several techniques exist for determining the local, atomic-scale structure of materials. These range from scanning tunnelling microscopy (STM) to spectroscopic methods that use X-rays, such as extended X-ray absorption fine structure (EXAFS) analysis. Each has its advantages and drawbacks. STM can give beautiful images, although not in three dimensions. For structural information to be inferred from spectroscopic techniques such as EXAFS, an accurate theoretical model relating spectra to structure is required3.
PDF analysis avoids some of these problems because it solely involves data on the distribution of distances between atoms in a structure — information that is readily obtained from X-ray or neutron-scattering experiments3. Why, then, is PDF not the method of choice for structure determination? The first factor is data quality: although the PDF technique has been known for decades, the lack of high-resolution data has limited its applicability, as well as that of many other techniques. That situation is now changing with the latest generation of experiments using modern neutron and synchrotron X-ray sources.
The second crucial factor is that an algorithm must be found that solves the ‘inverse problem’; that is, given a set of experimental data, how to extract the three-dimensional structure that must have created it. Determining the structure corresponding to PDF data, the question tackled by Juhás and colleagues2, is just such an inverse problem. The inverse problem is usually not trivial, as it involves various assumptions about a material and, potentially, many material-dependent parameters. Solutions typically involve minimizing the mean squared deviation between the experimental data and the data predicted from a theoretical model of the structure. This process often needs significant computational resources, as it requires the ‘direct problem’ — that is, a theoretical model for the experimental signal resulting from a trial structure — to be solved many times in the process of finding the minimum.
Obtaining a solution to the inverse problem is equivalent to an optimization strategy for finding the global minimum of a quantity involving many variables among a forest of possible minima. Numerous advances have been made in such strategies, which are crucial in fields from economics to protein folding4. These include the development of ‘genetic’ algorithms inspired by the rules of evolutionary biology, and ‘simulated annealing’ techniques that mimic the way metals freeze into a state of minimum energy. In the case of X-ray and neutron crystallography, the iterative ‘shake-and-bake’ algorithm1 has been revolutionary. This method involves the random perturbation of the positions of atoms in a crystal until the lowest-energy state is found, and it has reduced the time required for determining crystal structures from months to just hours.
Juhás and colleagues call their approach2 for inverting PDF data the ‘Liga algorithm’, because the method is modelled on the rules of promotion and relegation that determine the position of participating teams in most of the world's soccer leagues. Teams correspond to trial clusters of atoms; ‘winning’ clusters (those with the smallest errors between the model and the experiment) are iteratively promoted, whereas losing ones (those with the largest errors) are relegated, so that an optimal global structure is more quickly found. The authors show that their algorithm can determine a number of nanoscale structures, such as that of the ‘buckyball’ C60 molecule, with a perfect success rate. Genetic algorithms, in contrast, take considerably longer and have far lower rates of success.
So what are the limits of this approach, and can it be extended to other global-optimization problems? The limits are typically reached when there are more parameters in the theoretical model than can be represented by data, so that the inverse problem becomes ‘ill-conditioned’ — that is, it has unstable solutions. Optimization strategies must therefore include some way of stabilizing the solutions. Some of these approaches, such as choosing model parameters by guesswork, can involve more than a whiff of the black art, and potentially produce results that vary widely from one investigator to the next.
Alternative methods using powerful statistical methods such as bayesian analysis have been developed, which can avoid the arbitrariness of choosing model parameters5. They achieve stability by taking into account a priori information in order to constrain the overall probability distribution for a particular structure. Strategies such as the Liga algorithm could be extended significantly by including known structural information based on a system's physical and chemical properties or knowledge derived from theory and computational materials science. It may well then be possible to resolve heterogeneous nanostructures containing many hundreds of atoms.