The sheer number of variables and logical conditions makes some computing problems seem intractable. Statistical physics, normally used to study huge groups of interacting particles, can supply powerful tools to crack them.
As computer hardware and software become ever more sophisticated, we are shifting from a setting in which computers merely assist us in processing information with the aid of wellunderstood algorithms, to a landscape in which computers themselves make decisions and are in full control of a given situation. An example is the Pentagonsponsored 'Darpa Urban Challenge'^{1}, in which standard consumer vehicles are equipped with sensors such as laser rangefinders, cameras and global positioning systems. Taking millions of measurements per second, and using up to a dozen PCs to process the data, these vehicles can make driving decisions on their own in real time.
Such computational tasks are fiendishly complex. Many of them in fact fall into a general class of notoriously hard computational problems known as NPcomplete problems. (The NP stands for 'nondeterministic polynomial time; the general assumption is that, in the worst case, the time needed to solve NPcomplete problems explodes exponentially with the number of variables.) First characterized^{2} in the early 1970s, thousands of these problems have since been identified, in areas as diverse as hardware and software verification, planning and scheduling, automated reasoning, and computational biology. Writing in Physical Review E, Mézard and Tarzia^{3} demonstrate an innovative approach to solving one wellknown NPcomplete problem, known as the hittingset problem. Their approach borrows techniques from the statistical physics generally used to characterize the interaction of atomic ensembles, and highlights a trend of recent years: the application of fundamental physics to bring new perspectives to the study of hard computational problems.
The hittingset problem starts with a number of sets, each containing a certain number of items. Each individual item can occur in more than one set. A hitting set 'hits' the original sets in that it contains at least one item from each; the challenge is to find the smallest such set. Take a class of students, for example, each of whom plays one or more sports (Fig. 1). For each sport, there is a group (set) of students playing the sport. Here, the question would be: what is the smallest group of students I can choose so that all sports are represented in my sample?
Hittingset problems arise in many contexts, including fault diagnosis, group testing (for example, analysing many blood samples by analysing a few aggregates of individual samples), database searches, experiment design and DNA screening. The difficulty, in computational terms, is that the number of combinations of items that needs to be considered to find the smallest possible hitting set, or a hitting set smaller than a certain size, grows exponentially with the number of items. Such exponential or combinatorial search spaces are characteristic of NPcomplete problems.
In a broad sense, such problems involve a set of discrete variables and a set of constraints between the variables that model their interactions^{4}. In terms of our earlier hittingset example, each student is represented by a binary variable, which is set to 'one' if the student is part of the hitting set, and otherwise set to 'zero'. Each sport introduces a new constraint on the variables: at least one of the students playing a particular sport should have a variable set to one. A further constraint places an upper bound on the number of variables set to one, and therefore on the size of the hitting set. Finding variable assignments that satisfy individual constraints is generally quite easy. The challenge is to find an assignment to the variables that satisfies all constraints simultaneously.
Algorithms for doing this fall into two broad classes: backtrack search, introduced in the early 1960s, and local search, which popped up a decade later. Backtrack search methods proceed in a centralized way, by assigning values to variables one by one. Whenever a local constraint is violated — when none of the students playing a certain sport is flagged as belonging to a hitting set — one or more variable settings is revisited and changed. Simple bookkeeping techniques ensure that such a search explores the entire combinatorial search space.
In local search methods, by contrast, the exploration is less systematic. First, with a random guess, one assigns values to all variables. Such an assignment will generally violate many constraints, and a local search algorithm proceeds by trying to 'fix' variable settings to reduce the number of violations in the search for a variable setting that satisfies all constraints.
In tackling the hittingset problem, Mézard and Tarzia^{3} follow a fundamentally different route. They take advantage of a significant advance that occurred in the early 1990s, when computer scientists banded together with physicists to study ensembles of randomly generated instances of various NPcomplete problems^{5,6,7,8}. An 'instance' here is simply a particular example of the generic problem, defined by a set of variables, and specific governing constraints; in our previous example, one specifies a set of students and the sports they play.
This work revealed that, at certain values of the ratio of constraints to variables, ensembles of random instances of the same generic problem underwent a sudden change, dubbed a phase transition. Below the phasetransition point, most instances have one or more solutions that satisfy all constraints; above the phase transition, most instances do not have any solution, because there are too many constraints to satisfy. The instances that were hardest to solve occurred with numbers of variables and constraints that lay exactly at these phasetransition boundaries. A natural conclusion was that tools from statistical physics developed to study physical phase transitions might help in developing more efficient algorithms for solving combinatorial problems.
An example is the 'surveypropagation method'^{9} used by Mézard and Tarzia^{3}, which developed from the cavity method used in statistical physics to calculate groundstate properties of certain condensedmatter systems. Survey propagation solves random instances of the boolean satisfiability problem near phase transitions with large numbers of variables (more than 10^{7}), which are beyond the reach of backtrack and localsearch methods. This archetypal NPcomplete problem asks the question of whether, given a set of logical statements using boolean variables (variables that can be either 'true' or 'false'), there is any assignment of values to those variables that can satisfy all the statements.
Mézard and Tarzia use the surveypropagation method to compute statistical properties of the solutions of instances of the hittingset problem. Such a strategy might seem doomed to fail because it is generally significantly harder to determine the properties of the set of solutions of a hard computational problem than it is to find a single solution. But survey propagation can efficiently approximate the requisite statistical information for instances of various combinatorial problems near phase boundaries. It does this by iteratively solving a large set of coupled equations, modelling the local interactions between variables probabilistically. This solution process can be performed in a parallel, distributed fashion using many different processors, and generally converges to an answer extremely quickly — in seconds for equations with thousands of variables.
Survey propagation can be viewed as a generalization of the 'beliefpropagation method'^{10}, which was discovered independently in several fields, including information theory and artificial intelligence. Belief propagation is a way of approximating the probability (the 'belief') that a variable takes on a particular value in a randomly sampled solution. This information can be used to set variables incrementally, thus simplifying a problem.
The method works well when solutions are nicely clustered together in the combinatorial space, which is the case reasonably far from a phase transition. Near phase boundaries, however, solutions break up into many smaller, unconnected clusters in the combinatorial space^{11}. Conventional combinatorial search algorithms and standard beliefpropagation techniques become trapped between these clusters, and cannot effectively search the solution space. The surveypropagation method, on the other hand, continues to provide reliable statistical information about the solution space^{12,13,14}.
It is this property that allows Mézard and Tarzia^{3} to map out for the first time the space of hittingset problems, identifying under what conditions belief and surveypropagation methods can solve hard, random instances of the problem. They also identify regions where still more complex surveypropagationstyle equations would be required.
Given the ever increasing role of computational methods in other disciplines, the fact that those other disciplines are, in turn, starting to contribute new concepts and ideas to the science of computation is an exciting development — one that, as the demands we make on computational methods continue to grow, we are sure to hear more of.
References
 1
 2
Cook, S. Proc. 3rd Annu. ACM Symp. Theor. Comput. 151–158 (ACM, New York, 1971).
 3
Mézard, M. & Tarzia, M. Phys. Rev. E 76, 041124 (2007).
 4
Gomes, C. P. & Selman, B. Nature 435, 751–752 (2005).
 5
Cheeseman, P., Kanefsky, B. & Taylor, W. Proc. 12th Int. Joint Conf. Artif. Intell. 331–337 (Morgan Kaufmann, San Francisco, 1991).
 6
Mitchell, D., Selman, B. & Levesque, H. Proc. 10th Nat. Conf. on Artif. Intell. 459–465 (AAAI, Menlo Park, CA, 1992).
 7
Kirkpatrick, S. & Selman, B. Science 264, 1297–1301 (1994).
 8
Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B. & Troyansky, L. Nature 400, 133–137 (1999).
 9
Mézard, M., Parisi, G. & Zecchina, R. Science 297, 812–815 (2002).
 10
Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, San Francisco, CA, 1988).
 11
Mézard, M., Mora, T. & Zecchina, R. Phys. Rev. Lett. 94, 197205 (2005).
 12
Maneva, E., Mossel, E. & Wainwright, M. J. J. Assoc. Comput. Machin. 54 (4), 2–41 (2007).
 13
Braunstein, A. & Zecchina, R. J. Stat. Mech. P06007 (2004).
 14
Kroc, L., Sabharwal, A. & Selman, B. Proc. 23rd Conf. Uncert. Artif. Intell. 217–226 (AUAI Press, Corvallis, OR, 2007).
Author information
Affiliations
Rights and permissions
About this article
Cite this article
Selman, B. A hard statistical view. Nature 451, 639–640 (2008). https://doi.org/10.1038/451639a
Published:
Issue Date:
Further reading

Identifying personal microbiomes using metagenomic codes
Proceedings of the National Academy of Sciences (2015)

Elucidating microbial codes to distinguish individuals
Proceedings of the National Academy of Sciences (2015)

Variable complementary network: a novel approach for identifying biomarkers and their mutual associations
Metabolomics (2012)

Multivariate calibration of nearinfrared spectra by using influential variables
Analytical Methods (2012)

Simplivariate Models: Uncovering the Underlying Biology in Functional Genomics Data
PLoS ONE (2011)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.