Computational science

A hard statistical view

The sheer number of variables and logical conditions makes some computing problems seem intractable. Statistical physics, normally used to study huge groups of interacting particles, can supply powerful tools to crack them.

As computer hardware and software become ever more sophisticated, we are shifting from a setting in which computers merely assist us in processing information with the aid of well-understood algorithms, to a landscape in which computers themselves make decisions and are in full control of a given situation. An example is the Pentagon-sponsored 'Darpa Urban Challenge'1, in which standard consumer vehicles are equipped with sensors such as laser range-finders, cameras and global positioning systems. Taking millions of measurements per second, and using up to a dozen PCs to process the data, these vehicles can make driving decisions on their own in real time.

Such computational tasks are fiendishly complex. Many of them in fact fall into a general class of notoriously hard computational problems known as NP-complete problems. (The NP stands for 'non-deterministic polynomial time; the general assumption is that, in the worst case, the time needed to solve NP-complete problems explodes exponentially with the number of variables.) First characterized2 in the early 1970s, thousands of these problems have since been identified, in areas as diverse as hardware and software verification, planning and scheduling, automated reasoning, and computational biology. Writing in Physical Review E, Mézard and Tarzia3 demonstrate an innovative approach to solving one well-known NP-complete problem, known as the hitting-set problem. Their approach borrows techniques from the statistical physics generally used to characterize the interaction of atomic ensembles, and highlights a trend of recent years: the application of fundamental physics to bring new perspectives to the study of hard computational problems.

The hitting-set problem starts with a number of sets, each containing a certain number of items. Each individual item can occur in more than one set. A hitting set 'hits' the original sets in that it contains at least one item from each; the challenge is to find the smallest such set. Take a class of students, for example, each of whom plays one or more sports (Fig. 1). For each sport, there is a group (set) of students playing the sport. Here, the question would be: what is the smallest group of students I can choose so that all sports are represented in my sample?

Figure 1: Hitting sports.
figure1

In this representation of a simple hitting-set problem, students (dots) play one or more sports (encircling sets). The hitting-set problem asks what the smallest set of students is that encompasses all sports — in this case, the size of this set is three (red dots). The hitting-set problem is a classic example of an NP-complete problem, in which the complexity (and thus computing time and power needed to find a solution) probably grows exponentially as the numbers of variables (students) and constraints (sports) increase. Mézard and Tarzia3 show how the statistical methods used to characterize the interactions of ensembles of many particles in fundamental physics can be applied to solving hard, random instances of the hitting-set problem with thousands of variables and constraints.

Hitting-set problems arise in many contexts, including fault diagnosis, group testing (for example, analysing many blood samples by analysing a few aggregates of individual samples), database searches, experiment design and DNA screening. The difficulty, in computational terms, is that the number of combinations of items that needs to be considered to find the smallest possible hitting set, or a hitting set smaller than a certain size, grows exponentially with the number of items. Such exponential or combinatorial search spaces are characteristic of NP-complete problems.

In a broad sense, such problems involve a set of discrete variables and a set of constraints between the variables that model their interactions4. In terms of our earlier hitting-set example, each student is represented by a binary variable, which is set to 'one' if the student is part of the hitting set, and otherwise set to 'zero'. Each sport introduces a new constraint on the variables: at least one of the students playing a particular sport should have a variable set to one. A further constraint places an upper bound on the number of variables set to one, and therefore on the size of the hitting set. Finding variable assignments that satisfy individual constraints is generally quite easy. The challenge is to find an assignment to the variables that satisfies all constraints simultaneously.

Algorithms for doing this fall into two broad classes: backtrack search, introduced in the early 1960s, and local search, which popped up a decade later. Backtrack search methods proceed in a centralized way, by assigning values to variables one by one. Whenever a local constraint is violated — when none of the students playing a certain sport is flagged as belonging to a hitting set — one or more variable settings is revisited and changed. Simple book-keeping techniques ensure that such a search explores the entire combinatorial search space.

In local search methods, by contrast, the exploration is less systematic. First, with a random guess, one assigns values to all variables. Such an assignment will generally violate many constraints, and a local search algorithm proceeds by trying to 'fix' variable settings to reduce the number of violations in the search for a variable setting that satisfies all constraints.

In tackling the hitting-set problem, Mézard and Tarzia3 follow a fundamentally different route. They take advantage of a significant advance that occurred in the early 1990s, when computer scientists banded together with physicists to study ensembles of randomly generated instances of various NP-complete problems5,6,7,8. An 'instance' here is simply a particular example of the generic problem, defined by a set of variables, and specific governing constraints; in our previous example, one specifies a set of students and the sports they play.

This work revealed that, at certain values of the ratio of constraints to variables, ensembles of random instances of the same generic problem underwent a sudden change, dubbed a phase transition. Below the phase-transition point, most instances have one or more solutions that satisfy all constraints; above the phase transition, most instances do not have any solution, because there are too many constraints to satisfy. The instances that were hardest to solve occurred with numbers of variables and constraints that lay exactly at these phase-transition boundaries. A natural conclusion was that tools from statistical physics developed to study physical phase transitions might help in developing more efficient algorithms for solving combinatorial problems.

An example is the 'survey-propagation method'9 used by Mézard and Tarzia3, which developed from the cavity method used in statistical physics to calculate ground-state properties of certain condensed-matter systems. Survey propagation solves random instances of the boolean satisfiability problem near phase transitions with large numbers of variables (more than 107), which are beyond the reach of backtrack and local-search methods. This archetypal NP-complete problem asks the question of whether, given a set of logical statements using boolean variables (variables that can be either 'true' or 'false'), there is any assignment of values to those variables that can satisfy all the statements.

Mézard and Tarzia use the survey-propagation method to compute statistical properties of the solutions of instances of the hitting-set problem. Such a strategy might seem doomed to fail because it is generally significantly harder to determine the properties of the set of solutions of a hard computational problem than it is to find a single solution. But survey propagation can efficiently approximate the requisite statistical information for instances of various combinatorial problems near phase boundaries. It does this by iteratively solving a large set of coupled equations, modelling the local interactions between variables probabilistically. This solution process can be performed in a parallel, distributed fashion using many different processors, and generally converges to an answer extremely quickly — in seconds for equations with thousands of variables.

Survey propagation can be viewed as a generalization of the 'belief-propagation method'10, which was discovered independently in several fields, including information theory and artificial intelligence. Belief propagation is a way of approximating the probability (the 'belief') that a variable takes on a particular value in a randomly sampled solution. This information can be used to set variables incrementally, thus simplifying a problem.

The method works well when solutions are nicely clustered together in the combinatorial space, which is the case reasonably far from a phase transition. Near phase boundaries, however, solutions break up into many smaller, unconnected clusters in the combinatorial space11. Conventional combinatorial search algorithms and standard belief-propagation techniques become trapped between these clusters, and cannot effectively search the solution space. The survey-propagation method, on the other hand, continues to provide reliable statistical information about the solution space12,13,14.

It is this property that allows Mézard and Tarzia3 to map out for the first time the space of hitting-set problems, identifying under what conditions belief- and survey-propagation methods can solve hard, random instances of the problem. They also identify regions where still more complex survey-propagation-style equations would be required.

Given the ever increasing role of computational methods in other disciplines, the fact that those other disciplines are, in turn, starting to contribute new concepts and ideas to the science of computation is an exciting development — one that, as the demands we make on computational methods continue to grow, we are sure to hear more of.

References

  1. 1

    http://www.darpa.mil/grandchallenge/

  2. 2

    Cook, S. Proc. 3rd Annu. ACM Symp. Theor. Comput. 151–158 (ACM, New York, 1971).

  3. 3

    Mézard, M. & Tarzia, M. Phys. Rev. E 76, 041124 (2007).

    MathSciNet  Article  ADS  Google Scholar 

  4. 4

    Gomes, C. P. & Selman, B. Nature 435, 751–752 (2005).

    CAS  Article  ADS  Google Scholar 

  5. 5

    Cheeseman, P., Kanefsky, B. & Taylor, W. Proc. 12th Int. Joint Conf. Artif. Intell. 331–337 (Morgan Kaufmann, San Francisco, 1991).

  6. 6

    Mitchell, D., Selman, B. & Levesque, H. Proc. 10th Nat. Conf. on Artif. Intell. 459–465 (AAAI, Menlo Park, CA, 1992).

  7. 7

    Kirkpatrick, S. & Selman, B. Science 264, 1297–1301 (1994).

    MathSciNet  CAS  Article  ADS  Google Scholar 

  8. 8

    Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B. & Troyansky, L. Nature 400, 133–137 (1999).

    MathSciNet  CAS  Article  ADS  Google Scholar 

  9. 9

    Mézard, M., Parisi, G. & Zecchina, R. Science 297, 812–815 (2002).

    Article  ADS  Google Scholar 

  10. 10

    Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, San Francisco, CA, 1988).

    Google Scholar 

  11. 11

    Mézard, M., Mora, T. & Zecchina, R. Phys. Rev. Lett. 94, 197205 (2005).

    Article  ADS  Google Scholar 

  12. 12

    Maneva, E., Mossel, E. & Wainwright, M. J. J. Assoc. Comput. Machin. 54 (4), 2–41 (2007).

    Article  Google Scholar 

  13. 13

    Braunstein, A. & Zecchina, R. J. Stat. Mech. P06007 (2004).

  14. 14

    Kroc, L., Sabharwal, A. & Selman, B. Proc. 23rd Conf. Uncert. Artif. Intell. 217–226 (AUAI Press, Corvallis, OR, 2007).

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Selman, B. A hard statistical view. Nature 451, 639–640 (2008). https://doi.org/10.1038/451639a

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing