Home
Blog home

Simulated annealing for global optimisation

2024-10-31 — 8 min read


Simulated annealing is a surprisingly simple yet very robust technique used to find the global optimum of a complicated objective function. It is inspired by the process of annealing in metallurgy, where a hot material is allowed to cool over a long period of time as a way of reducing its internal defects.

The algorithm is a stochastic process that explores potential solutions to an objective function, moving between states probabilistically depending on the relative improvement of a trial solution over the current solution. The fact it can sometimes move to a worse state allows it to escape local minima that could otherwise trap other gradient-based methods.

An example of a noisy function with lots of local minima. Normal gradient-based optimisation methods would fail to find the lowest point on the curve, but simulated annealing might succeed.

It works by slowly reducing the "temperature" of a system during which it tests out different neighbouring solutions to its current state. If a trial solution is better than the current one then the system jumps straight to that new state. If the trial solution is worse, then it can still jump to that state but only with some probability, \(P < 1\). The temperature influences this transition probability: higher temperatures mean the system is more likely to explore worse solutions.

Over time the temperature is reduced according to a "cooling schedule", making the system less likely to accept poor solutions. This means that eventually the system will settle down into the "best" (lowest cost) state.

Components of simulated annealing

The simulated annealing algorithm has several crucial components that work together to perform the optimisation. Each part mimics aspects of the physical annealing process in materials, where heat and gradual cooling allow particles to settle into a low-energy state. Here are the main features:

Objective Function

The objective function is the thing to be optimised. When simulated annealing is appropriate the objective function is usually a high-dimensional (has a lot of parameters), complicated (non-linear) function and can be quantified by a single measure. It may be discrete or combinatorial, for example the travelling salesman problem or the layout of a factory to maximise efficiency.

Neighbourhood structure

There must be the ability for the system to explore neighbouring states of the objective function by making small adjustments to the parameters subject to any constraints upon them.

Cooling schedule

The cooling schedule dictates how the temperature of the system is adjusted over time. The temperature starts high and decreases over successive iterations, slowly to avoid the system converging too quickly (to a suboptimal solution), but not too slow that the algorithm becomes inefficient. Commonly chosen cooling schedules include linear, exponential, or logarithmic decays, and adaptive cooling is sometimes used to adjust the schedule according to the behaviour of the objective function.

Acceptance probability

A key component of simulated annealing is the method to calculate the probability of moving to a neighbouring trial state \(S'\) when it presents a worse solution compared to the current state \(S\). In the case when the cost of the trial state is higher than the current state, \(C’ > C\), the transition probability is given by $$ P(S \leftarrow S’) = \exp\bigg(- \frac{C’-C}{T}\bigg) $$

In this scenario a step from state \(S\) to \(S'\) has a probability \(P=1\) or being accepted as the overall cost is reduced. However, a step to \(S''\) leads to increased cost \((C'' > C)\) so the transition occurs with some temperature-dependent probability \(P\lt1\).

Stopping criterion

The algorithm requires some criterion to determine when to terminate and return the solution. This could be when the temperature falls below some threshold, or after a set number of iterations. Another option is to detect the stabilisation of the system when there are no (or very little) improvements to the solution over many iterations.

Psuedocode

The basic simulated annealing process featuring all of the steps above is show below:

Initialise the system into some state \(S\) at starting temperature \(T\) with cost \(C\).

WHILE the stopping criterion is not met:
For \(N\) iterations each epoch:
Explore a neighbouring state \(S'\) that has cost \(C'\)
IF \(C' < C\) then jump to the new state:
\(S \leftarrow S'\)
\(C \leftarrow C'\)
ELSE jump to the new state with probability \(P = \exp(-(C'-C)/T)\):
\(S \leftarrow S'\)
\(C \leftarrow C'\)
Reduce the temperature according to the cooling schedule:
\(T \leftarrow T'\)

Where is simulated annealing applicable?

Where is it NOT suitable?

Advantages and disadvantages

Simulated annealing has both advantages and disadvantages over "traditional" optimisation methods:

Advantages

Disadvantages

Further reading

Back to top