My Views on Some Global Optimization Methods

This is adapted from a section of my protein folding survey (ps.gz file, 220K), where you can find references. Instead of describing technical details of the various methods (these vary from author to author), I want to present a vivid informal view of the basic techniques, their strengths and weaknesses. The three most popular methods (simulated annealing, genetic algorithms, and smoothing/continuation methods) are based on analogies to natural processes where more or less global optima are reached. The fourth (branch and bound) is a more mathematically motivated technique.

Generally, branch and bound methods (in particular, interval methods, dc methods, and mixed integer methods based on piecewise linear approximations) are more reliable since, to the extent they work (which depends on the difficulty of the problem), they have built in guarantees; however, they require more or less detailed access to global information about the problem.

On the other hand, stochastic or heuristic methods are generally easier to program, depend only on black box function (and sometimes gradient) routines, and therefore can never be sure of not having missed the global optimizer.

Comparisons on relative efficiency are virtually missing, and the quality depends a lot on details of the implementation. At the present stage of knowledge, if you need global optimization you must experiment with the known methods (or packages) until you have something satisfying you. Unfortunately, this is a rather unsatisfactory state of the art...


Simulated annealing

takes its intuition from the fact that the heating (annealing) and slow cooling of a metal brings it into a more uniformly crystalline state, that is believed to be the state where the free energy of bulk matter takes its global minimum. (Incidentally, even for the simplest potential energy functions, it is still an unsolved problem whether this is indeed true with mathematical rigor. Apart from that, even very pure crystals still have defects; i.e., the global minimum is not quite achieved in nature.) The role of temperature is to allow the configurations to reach higher energy states with a probability given by Boltzmann's exponential law, so that they can overcome energy barriers that would otherwise force them into local minima. This is quite unlike line search methods and trust region methods on which good local optimization programs are based.

In its original form, the simulated annealing method is provably convergent (in a probabilistic sense) but exceedingly slow; various ad hoc enhancements make it much faster. In particular, except for simple problems, success depends very much on the implementation used.


Genetic algorithms

make use of analogies to biological evolution by allowing mutations and crossing over between candidates for good local optima in the hope to derive even better ones. At each stage, a whole population of configurations are stored. Mutations have a similar effect as random steps in simulated annealing, and the equivalent of lowering of the temperature is a rule for more stringent selection of surviving or mating individuals.

The ability to leave regions of attraction to local minimizers is, however, drastically enhanced by crossing over. This is an advantage if, with high probability, the crossing rules produce offspring of similar or even better fitness (objective function value); if not, it is a severe disatvantage. Therefore the efficiency of a genetic algorithm (compared with simulated annealing type methods) depends in a crucial way on the proper selection of crossing rules. The effect of interchanging coordinates is beneficial mainly when these coordinates have a nearly independent influence on the fitness, whereas if their influence is highly correlated (such as for functions with deep and narrow valleys not parallel to the coordinate axes), genetic algorithms have much more difficulties. Thus, unlike simulated annealing, successful tuning of genetic algorithms requires a considerable amount of insight into the nature of the problem at hand.


Both simulated annealing methods and genetic algorithms are, in their simpler forms, easy to understand and easy to implement, features that invite potential users of optimization methods to experiment with their own versions. The methods often work, if only slowly, and lacking better alternatives, they are very useful tools for applications, where the primary interest is to find (near-)solutions now, even when the reliability is uncertain and only subglobal optima are reached.

To make simulated annealing methods and genetic algorithms efficient, clever enhancements are essential. However, theoretical work on explaining the effectiveness of useful enhancements is completely lacking. I also haven't seen careful comparisons of the various options available and their comparative evaluation on standard collections of test problems.


Smoothing methods

are based on the intuition that, in nature, macroscopic features are usually an average effect of microscopic details; averaging smoothes out the details in such a way as to reveal the global picture. A huge valley seen from far away has a well-defined and simple shape; only by looking more closely, the many local minima are visible, more and more at smaller and smaller scales. The hope is that by smoothing a rugged potential energy surface, most or all local minima disappear, and the remaining major features of the surface only show a single minimizer. By adding more and more details, the approximations made by the smoothing are undone, and finally one ends up at the global minimizer of the original surface.

While it is quite possible for such a method to miss the global minimum (so that full reliability cannot be guaranteed, and is not achieved in the tests reported by their authors), a proper implementation of this idea at least gives very good local minima with a fraction of the function evaluations needed for the blind annealing and genetic methods.

A conceptually attractive smoothing technique is the diffusion equation method, where the original objective function V(x) is smeared out by artificial diffusion. The solution V(x,t) of the diffusion equation V_{xx}(x,t)=V_t(x,t) with initial condition V(x,0)=V(x) gets smoother and smoother as t gets larger; for large enough t, it is even unimodal. Thus V(x,t) can be minimized by local methods when t is sufficiently large, and using the minimizer at a given t as a starting point for a local optimization at a smaller t, a sequence of local minimizers of V(x,t) for successively smaller t is obtained until finally, with t=0, a minimizer of the original potential is reached. It should be possible to increase the reliability of the method by using a limited amount of global search in each optimization stage.

The diffusion equation can be solved by convolution with a Gaussian kernel and hence is explicitly available, e.g., when V(x) is an expolynomial, i.e., a linear combination of arbitrary products of powers of components x_i and exponentials exp(c^T x). For other partially separable functions, one has to use numerical integration, making function evaluation more expensive. For functions lacking partial separability, the method cannot be used in practice.

Again, there is no theoretical work on conditions that would ensure convergence to the global minimum.


While the above techniques are motivated by nature it is important to remember that processes in nature need not be the most efficient ones; at best it can be assumed to be efficient given the conditions under which they have to operate (namely an uncertain and changing environment that is potentially hazardous to those operating in it). Indeed, much of our present technology has vastly surpassed natural efficiency by unnatural means, and it would be surprising if it were different in global optimization.

Even assuming that nature solves truly global optimization problems (a disputable assumption), simple lower estimates for the number of elementary steps -- roughly corresponding to function evaluations -- available to natural processes to converge are (in chemistry and in biology) in the range of 10^15 or even more. Thus to be successful on the computers of today or the near future, we must find methods that are much faster, exploring the configuration space in a planned, intelligent way, not with a blind combination of chance and necessity. And the challenge is to devise methods that can be analyzed sufficiently well to guarantee reliablility and success.


Branch and Bound methods

are most interesting in this respect since they lead to lower bounds on the objective function. They are the only current methods that allow an assessment of the quality of the local minima obtained, and combined with computationally verifiable sufficient conditions for global minima (see, e.g., dvi.gz file, 18K, or ps.gz file, 83K), they may allow one to actually prove global optimality of the best local optimizers obtained, and thus may remove the aura of ambiguity inherent in all `global' calculations of the past.

In the worst case, branch and bound methods take an exponential amount of work; but many problems arising in practice are far from worst case. Many of the heuristic techniques used currently for searching the conformational space of global optimization problems can be adapted to or combined with the branch and bound approach to take advantage of the structural insights of specific applications.

Branch and bound methods will ultimately allow one not only to calculate the global minimum reliably, but also to find all local minima and saddle points within a certain margin of the global minimum. (Saddle points describe transition states in molecular modeling.) This is essentially the problem of finding all zeros of the nonlinear system V'(x)=0 that satisfy a constraint on the signature of the Hessian and on the value of V. Such problems can already be handled in moderate dimensions by branch and bound methods, combined with techniques from interval analysis. It should be possible to combine these techniques with underestimation techniques provided by dc-methods.


Global Optimization
Protein Folding
my home page (http://www.mat.univie.ac.at/~neum)

Arnold Neumaier (Arnold.Neumaier@univie.ac.at)