Introduction
Simulated Annealing (SA) stands as one of the most elegant optimization algorithms, drawing its inspiration from the physical process of metallurgyโspecifically, the controlled cooling of metals to achieve optimal crystalline structures. Developed independently by Kirkpatrick, Gelatt, and Vecchi in 1983, SA introduced the revolutionary concept of accepting worse solutions probabilistically to escape local optima.
In 2026, simulated annealing remains relevant for combinatorial optimization, discrete problems, and situations where finding near-optimal solutions quickly matters more than guaranteed optimality. This article explores the algorithm’s foundations, implementation strategies, variants, and practical applications.
The Physical Analogy
Metallurgical Annealing
In metallurgy, annealing involves:
- Heating a material to high temperature
- Gradually cooling to allow atoms to settle into low-energy configurations
- Result: Stronger, more ductile material with minimal defects
The key insight: at high temperatures, atoms can move freely, exploring many configurations. As temperature decreases, movements become restricted, but occasional “thermal excitations” still allow escape from suboptimal states.
Mapping to Optimization
| Metallurgy | Optimization |
|---|---|
| Energy state | Objective function value |
| Ground state | Global optimum |
| Local minima | Local optima |
| Temperature | Control parameter T |
| Cooling rate | Cooling schedule |
The Metropolis Criterion
The core of simulated annealing is the Metropolis acceptance criterion:
import random
import math
import numpy as np
def metropolis_criterion(delta_e, temperature):
"""
Decide whether to accept a worse solution.
delta_e: Change in objective function (positive = worse)
temperature: Current temperature
"""
if delta_e <= 0:
return True
probability = math.exp(-delta_e / temperature)
return random.random() < probability
This formula ensures:
- High temperature: High probability of accepting worse solutions (exploration)
- Low temperature: Low probability of accepting worse solutions (exploitation)
Basic Simulated Annealing Algorithm
def simulated_annealing(objective_function, initial_solution,
neighbor_function, initial_temp=1000,
final_temp=0.01, cooling_rate=0.995,
max_iterations_per_temp=100):
"""
Basic simulated annealing optimization.
Args:
objective_function: Function to minimize
initial_solution: Starting point
neighbor_function: Function to generate neighboring solutions
initial_temp: Starting temperature
final_temp: Temperature at which to stop
cooling_rate: Multiplicative factor to reduce temperature
max_iterations_per_temp: Iterations at each temperature
Returns:
Best solution found and its objective value
"""
current_solution = initial_solution
current_energy = objective_function(current_solution)
best_solution = current_solution
best_energy = current_energy
temperature = initial_temp
while temperature > final_temp:
for _ in range(max_iterations_per_temp):
# Generate neighboring solution
candidate_solution = neighbor_function(current_solution)
candidate_energy = objective_function(candidate_solution)
# Calculate energy difference
delta_e = candidate_energy - current_energy
# Accept or reject
if metropolis_criterion(delta_e, temperature):
current_solution = candidate_solution
current_energy = candidate_energy
# Update best if improved
if current_energy < best_energy:
best_solution = current_solution
best_energy = current_energy
# Cool down
temperature *= cooling_rate
return best_solution, best_energy
Cooling Schedules
The cooling schedule significantly impacts SA performance. Several approaches exist:
Exponential Cooling
The most common approach:
def exponential_cooling(temperature, cooling_rate):
return temperature * cooling_rate
Temperature progression: Tโ, Tโรฮฑ, Tโรฮฑยฒ, …, Tโ
Linear Cooling
def linear_cooling(temperature, cooling_step):
return max(final_temp, temperature - cooling_step)
Logarithmic Cooling
def logarithmic_cooling(iteration, initial_temp, c):
"""
Very slow cooling, theoretically guarantees convergence
"""
return initial_temp / (1 + c * math.log(1 + iteration))
Adaptive Cooling
class AdaptiveCooling:
def __init__(self, initial_temp=1000, increase_factor=1.1,
decrease_factor=0.9):
self.temperature = initial_temp
self.increase_factor = increase_factor
self.decrease_factor = decrease_factor
self.acceptance_rate = 1.0
self.target_acceptance = 0.44
def update(self, num_accepted, num_proposed):
if num_proposed == 0:
return
current_rate = num_accepted / num_proposed
if current_rate > self.target_acceptance:
self.temperature *= self.decrease_factor
else:
self.temperature *= self.increase_factor
Neighbor Generation
The neighbor function defines what “nearby” means:
For Continuous Variables
def continuous_neighbor(solution, step_size=0.1):
"""Add Gaussian noise to continuous solution."""
return [x + random.gauss(0, step_size) for x in solution]
def bounded_continuous_neighbor(solution, bounds, step_size=0.1):
"""Gaussian noise with boundary handling."""
new_solution = []
for i, x in enumerate(solution):
new_x = x + random.gauss(0, step_size)
new_x = max(bounds[i][0], min(bounds[i][1], new_x))
new_solution.append(new_x)
return new_solution
For Combinatorial Problems
def swap_neighbor(solution):
"""Swap two positions (for permutations)."""
new_solution = solution.copy()
i, j = random.sample(range(len(new_solution)), 2)
new_solution[i], new_solution[j] = new_solution[j], new_solution[i]
return new_solution
def flip_neighbor(solution):
"""Flip one element (for binary strings)."""
new_solution = solution.copy()
i = random.randint(0, len(new_solution) - 1)
new_solution[i] = 1 - new_solution[i]
return new_solution
def insert_neighbor(solution):
"""Remove one element and insert at different position."""
new_solution = solution.copy()
i = random.randint(0, len(new_solution) - 1)
element = new_solution.pop(i)
j = random.randint(0, len(new_solution))
new_solution.insert(j, element)
return new_solution
Complete Implementation
class SimulatedAnnealing:
def __init__(self, objective_function, neighbor_function,
initial_solution=None, initial_temp=1000,
final_temp=0.01, cooling_rate=0.995,
max_iterations_per_temp=100):
self.objective_function = objective_function
self.neighbor_function = neighbor_function
self.initial_temp = initial_temp
self.final_temp = final_temp
self.cooling_rate = cooling_rate
self.max_iterations_per_temp = max_iterations_per_temp
self.current_solution = initial_solution
self.best_solution = None
self.best_energy = float('inf')
self.temperature = initial_temp
self.history = []
def initialize(self, initial_solution):
"""Initialize with a solution or generate random one."""
if initial_solution is None:
raise ValueError("Initial solution required")
self.current_solution = initial_solution
self.best_solution = initial_solution
self.best_energy = self.objective_function(initial_solution)
self.temperature = self.initial_temp
def run(self):
"""Run the simulated annealing algorithm."""
iteration = 0
while self.temperature > self.final_temp:
accepted = 0
for _ in range(self.max_iterations_per_temp):
# Generate neighbor
candidate = self.neighbor_function(self.current_solution)
candidate_energy = self.objective_function(candidate)
# Calculate difference
delta = candidate_energy - self.current_energy
# Accept or reject
if metropolis_criterion(delta, self.temperature):
self.current_solution = candidate
self.current_energy = candidate_energy
accepted += 1
# Update best
if self.current_energy < self.best_energy:
self.best_solution = self.current_solution
self.best_energy = self.current_energy
# Record history
self.history.append({
'iteration': iteration,
'temperature': self.temperature,
'current_energy': self.current_energy,
'best_energy': self.best_energy,
'acceptance_rate': accepted / self.max_iterations_per_temp
})
# Cool down
self.temperature *= self.cooling_rate
iteration += 1
return self.best_solution, self.best_energy
def get_statistics(self):
return {
'initial_temperature': self.initial_temp,
'final_temperature': self.temperature,
'iterations': len(self.history),
'best_energy': self.best_energy,
'improvements': sum(1 for h in self.history
if h['current_energy'] < h['best_energy'])
}
Variants and Extensions
Fast Simulated Annealing
Uses Cauchy-Lorentz distribution for larger jumps:
def fast_neighbor(solution, temperature):
"""Generate neighbor using heavy-tailed distribution."""
new_solution = []
for x in solution:
# Cauchy-Lorentz distribution
u = random.random() - 0.5
delta = temperature * math.tan(math.pi * u)
new_solution.append(x + delta)
return new_solution
Parallel Tempering (Replica Exchange)
Multiple SA runs at different temperatures exchange solutions:
class ParallelTempering:
def __init__(self, num_replicas, objective, neighbor_fn):
self.num_replicas = num_replicas
self.replicas = []
# Initialize replicas at different temperatures
for i in range(num_replicas):
temp = initial_temp * (cooling_rate ** i)
self.replicas.append(SimulatedAnnealing(objective, neighbor_fn, temp))
def exchange_solutions(self, i, j):
"""Exchange solutions between replicas with probability."""
delta = (1/self.replicas[i].temperature - 1/self.replicas[j].temperature) * \
(self.replicas[i].current_energy - self.replicas[j].current_energy)
if metropolis_criterion(-delta, 1): # Use reference temperature
self.replicas[i].current_solution, self.replicas[j].current_solution = \
self.replicas[j].current_solution, self.replicas[i].current_solution
Threshold Accepting
Simplified variant that only accepts improvements above a threshold:
def threshold_accepting(objective, neighbor_fn, initial_solution, threshold_schedule):
current = initial_solution
current_energy = objective(current)
best = current
best_energy = current_energy
for threshold in threshold_schedule:
candidate = neighbor_fn(current)
candidate_energy = objective(candidate)
if candidate_energy - current_energy < threshold:
current = candidate
current_energy = candidate_energy
if current_energy < best_energy:
best = current
best_energy = current_energy
return best, best_energy
Quantum Annealing
Uses quantum tunneling effects (specialized hardware):
# Note: Quantum annealing requires specialized hardware (D-Wave)
# This is a conceptual example of transverse-field Ising model
class QuantumAnnealing:
def __init__(self, problem_hamiltonian):
self.H = problem_hamiltonian # Problem Hamiltonian
self.quantum_term = None # Transverse field term
def solve(self, annealing_time, num_slots):
"""
Simulated quantum annealing (simplified)
In practice, this would use actual quantum hardware
"""
# Initialize in superposition
state = self._uniform_superposition()
# Gradually introduce problem Hamiltonian
for s in range(num_slots):
A(s) = annealing_time * (1 - s/num_slots)
B(s) = annealing_time * (s/num_slots)
# Apply quantum operations
state = self._apply_annealing_step(state, A(s), B(s))
return self._measure(state)
Practical Applications
Traveling Salesman Problem
def tsp_objective(route, distance_matrix):
"""Total distance for TSP route."""
total = 0
for i in range(len(route) - 1):
total += distance_matrix[route[i]][route[i+1]]
total += distance_matrix[route[-1]][route[0]]
return total
def solve_tsp_sa(distance_matrix):
n = len(distance_matrix)
initial = list(range(n))
random.shuffle(initial)
sa = SimulatedAnnealing(
objective_function=lambda r: tsp_objective(r, distance_matrix),
neighbor_function=swap_neighbor,
initial_solution=initial,
initial_temp=10000,
cooling_rate=0.9999,
max_iterations_per_temp=1000
)
return sa.run()
Job Shop Scheduling
def job_shop_objective(schedule, jobs, machines):
"""Makespan minimization."""
machine_times = [0] * len(machines)
for operation in schedule:
job_id, machine_id = operation['job'], operation['machine']
processing_time = jobs[job_id]['operations'][operation['op_id']]['time']
machine_times[machine_id] += processing_time
return max(machine_times)
def job_shop_neighbor(schedule):
"""Swap two operations on same machine."""
new_schedule = [op.copy() for op in schedule]
i, j = random.sample(range(len(new_schedule)), 2)
new_schedule[i], new_schedule[j] = new_schedule[j], new_schedule[i]
return new_schedule
Image Reconstruction
def image_restoration(objective_image, degraded_image):
"""Restore image using simulated annealing."""
def objective(pixels):
# Data fidelity + smoothness regularization
fidelity = np.sum((pixels - degraded_image) ** 2)
smoothness = np.sum(np.abs(np.diff(pixels)))
return fidelity + 0.1 * smoothness
def neighbor(image):
# Add noise to random pixel
new_image = image.copy()
i, j = random.randint(0, image.shape[0]-1), random.randint(0, image.shape[1]-1)
new_image[i, j] += random.gauss(0, 10)
return new_image
return simulated_annealing(objective, degraded_image, neighbor)
Neural Network Weight Optimization
def optimize_nn_weights_sa(model, X_train, y_train):
"""Optimize neural network weights using SA."""
def objective(weights):
model.set_weights(weights)
predictions = model.predict(X_train)
return np.mean((predictions - y_train) ** 2)
# Flatten initial weights
initial_weights = np.concatenate([w.flatten() for w in model.get_weights()])
def neighbor(weights):
new_weights = weights.copy()
# Perturb random weight
i = random.randint(0, len(new_weights)-1)
new_weights[i] += random.gauss(0, 0.1)
return new_weights
best_weights, best_error = simulated_annealing(
objective, initial_weights, neighbor,
initial_temp=1.0, cooling_rate=0.999
)
return best_weights
Parameter Selection Guidelines
Temperature Range
| Problem Type | Initial Temp | Final Temp |
|---|---|---|
| Combinatorial | 100-10000 | 0.001-0.1 |
| Continuous | 1-100 | 0.0001-0.01 |
Cooling Rate
- 0.80-0.99: Slower cooling, better solutions
- 0.95-0.999: Standard rates
- >0.999: Very slow, computationally expensive
Iterations per Temperature
- Higher for complex landscapes
- Balance between exploration and computation time
- Typical range: 10-1000
Conclusion
Simulated annealing remains a powerful and elegant optimization technique. Its key strengths include:
- Theoretical Foundation: Provides guaranteed convergence under appropriate cooling schedules
- Simplicity: Easy to implement and understand
- Flexibility: Works with any neighbor function and objective
- Global Search: Escapes local optima through probabilistic acceptance
While convergence can be slow compared to other methods, SA excels when:
- The problem landscape has many local optima
- Gradient information is unavailable
- A good solution quickly is more valuable than the best solution slowly
- The problem is combinatorial rather than continuous
The algorithm’s enduring relevance in 2026 reflects its fundamental effectiveness and the universal applicability of its core insight: sometimes, to find the best solution, you must be willing to accept worse ones temporarily.
Resources
- Optimization by Simulated Annealing - Original SA paper
- Simulated Annealing: Theory and Applications
- Traveling Salesman Problem and Simulated Annealing
- PySA - Python Simulated Annealing
Comments