Simulated Annealing: Probabilistic Optimization

Introduction

Simulated Annealing (SA) stands as one of the most elegant optimization algorithms, drawing its inspiration from the physical process of metallurgy—specifically, the controlled cooling of metals to achieve optimal crystalline structures. Developed independently by Kirkpatrick, Gelatt, and Vecchi in 1983, SA introduced the revolutionary concept of accepting worse solutions probabilistically to escape local optima.

In 2026, simulated annealing remains relevant for combinatorial optimization, discrete problems, and situations where finding near-optimal solutions quickly matters more than guaranteed optimality. This article explores the algorithm’s foundations, implementation strategies, variants, and practical applications.

The Physical Analogy

Metallurgical Annealing

In metallurgy, annealing involves:

Heating a material to high temperature
Gradually cooling to allow atoms to settle into low-energy configurations
Result: Stronger, more ductile material with minimal defects

The key insight: at high temperatures, atoms can move freely, exploring many configurations. As temperature decreases, movements become restricted, but occasional “thermal excitations” still allow escape from suboptimal states.

Mapping to Optimization

Metallurgy	Optimization
Energy state	Objective function value
Ground state	Global optimum
Local minima	Local optima
Temperature	Control parameter T
Cooling rate	Cooling schedule

The Metropolis Criterion

The core of simulated annealing is the Metropolis acceptance criterion:

import random
import math
import numpy as np

def metropolis_criterion(delta_e, temperature):
    """
    Decide whether to accept a worse solution.
    
    delta_e: Change in objective function (positive = worse)
    temperature: Current temperature
    """
    if delta_e <= 0:
        return True
    
    probability = math.exp(-delta_e / temperature)
    return random.random() < probability

This formula ensures:

High temperature: High probability of accepting worse solutions (exploration)
Low temperature: Low probability of accepting worse solutions (exploitation)

Basic Simulated Annealing Algorithm

def simulated_annealing(objective_function, initial_solution, 
                       neighbor_function, initial_temp=1000,
                       final_temp=0.01, cooling_rate=0.995,
                       max_iterations_per_temp=100):
    """
    Basic simulated annealing optimization.
    
    Args:
        objective_function: Function to minimize
        initial_solution: Starting point
        neighbor_function: Function to generate neighboring solutions
        initial_temp: Starting temperature
        final_temp: Temperature at which to stop
        cooling_rate: Multiplicative factor to reduce temperature
        max_iterations_per_temp: Iterations at each temperature
    
    Returns:
        Best solution found and its objective value
    """
    current_solution = initial_solution
    current_energy = objective_function(current_solution)
    
    best_solution = current_solution
    best_energy = current_energy
    
    temperature = initial_temp
    
    while temperature > final_temp:
        for _ in range(max_iterations_per_temp):
            # Generate neighboring solution
            candidate_solution = neighbor_function(current_solution)
            candidate_energy = objective_function(candidate_solution)
            
            # Calculate energy difference
            delta_e = candidate_energy - current_energy
            
            # Accept or reject
            if metropolis_criterion(delta_e, temperature):
                current_solution = candidate_solution
                current_energy = candidate_energy
                
                # Update best if improved
                if current_energy < best_energy:
                    best_solution = current_solution
                    best_energy = current_energy
        
        # Cool down
        temperature *= cooling_rate
    
    return best_solution, best_energy

Cooling Schedules

The cooling schedule significantly impacts SA performance. Several approaches exist:

Exponential Cooling

The most common approach:

def exponential_cooling(temperature, cooling_rate):
    return temperature * cooling_rate

Temperature progression: T₀, T₀×α, T₀×α², …, Tₙ

Linear Cooling

def linear_cooling(temperature, cooling_step):
    return max(final_temp, temperature - cooling_step)

Logarithmic Cooling

def logarithmic_cooling(iteration, initial_temp, c):
    """
    Very slow cooling, theoretically guarantees convergence
    """
    return initial_temp / (1 + c * math.log(1 + iteration))

Adaptive Cooling

class AdaptiveCooling:
    def __init__(self, initial_temp=1000, increase_factor=1.1, 
                 decrease_factor=0.9):
        self.temperature = initial_temp
        self.increase_factor = increase_factor
        self.decrease_factor = decrease_factor
        self.acceptance_rate = 1.0
        self.target_acceptance = 0.44
    
    def update(self, num_accepted, num_proposed):
        if num_proposed == 0:
            return
        
        current_rate = num_accepted / num_proposed
        
        if current_rate > self.target_acceptance:
            self.temperature *= self.decrease_factor
        else:
            self.temperature *= self.increase_factor

Neighbor Generation

The neighbor function defines what “nearby” means:

For Continuous Variables

def continuous_neighbor(solution, step_size=0.1):
    """Add Gaussian noise to continuous solution."""
    return [x + random.gauss(0, step_size) for x in solution]

def bounded_continuous_neighbor(solution, bounds, step_size=0.1):
    """Gaussian noise with boundary handling."""
    new_solution = []
    for i, x in enumerate(solution):
        new_x = x + random.gauss(0, step_size)
        new_x = max(bounds[i][0], min(bounds[i][1], new_x))
        new_solution.append(new_x)
    return new_solution

For Combinatorial Problems

def swap_neighbor(solution):
    """Swap two positions (for permutations)."""
    new_solution = solution.copy()
    i, j = random.sample(range(len(new_solution)), 2)
    new_solution[i], new_solution[j] = new_solution[j], new_solution[i]
    return new_solution

def flip_neighbor(solution):
    """Flip one element (for binary strings)."""
    new_solution = solution.copy()
    i = random.randint(0, len(new_solution) - 1)
    new_solution[i] = 1 - new_solution[i]
    return new_solution

def insert_neighbor(solution):
    """Remove one element and insert at different position."""
    new_solution = solution.copy()
    i = random.randint(0, len(new_solution) - 1)
    element = new_solution.pop(i)
    j = random.randint(0, len(new_solution))
    new_solution.insert(j, element)
    return new_solution

Complete Implementation

class SimulatedAnnealing:
    def __init__(self, objective_function, neighbor_function,
                 initial_solution=None, initial_temp=1000,
                 final_temp=0.01, cooling_rate=0.995,
                 max_iterations_per_temp=100):
        
        self.objective_function = objective_function
        self.neighbor_function = neighbor_function
        self.initial_temp = initial_temp
        self.final_temp = final_temp
        self.cooling_rate = cooling_rate
        self.max_iterations_per_temp = max_iterations_per_temp
        
        self.current_solution = initial_solution
        self.best_solution = None
        self.best_energy = float('inf')
        self.temperature = initial_temp
        
        self.history = []
    
    def initialize(self, initial_solution):
        """Initialize with a solution or generate random one."""
        if initial_solution is None:
            raise ValueError("Initial solution required")
        
        self.current_solution = initial_solution
        self.best_solution = initial_solution
        self.best_energy = self.objective_function(initial_solution)
        self.temperature = self.initial_temp
    
    def run(self):
        """Run the simulated annealing algorithm."""
        iteration = 0
        
        while self.temperature > self.final_temp:
            accepted = 0
            
            for _ in range(self.max_iterations_per_temp):
                # Generate neighbor
                candidate = self.neighbor_function(self.current_solution)
                candidate_energy = self.objective_function(candidate)
                
                # Calculate difference
                delta = candidate_energy - self.current_energy
                
                # Accept or reject
                if metropolis_criterion(delta, self.temperature):
                    self.current_solution = candidate
                    self.current_energy = candidate_energy
                    accepted += 1
                    
                    # Update best
                    if self.current_energy < self.best_energy:
                        self.best_solution = self.current_solution
                        self.best_energy = self.current_energy
            
            # Record history
            self.history.append({
                'iteration': iteration,
                'temperature': self.temperature,
                'current_energy': self.current_energy,
                'best_energy': self.best_energy,
                'acceptance_rate': accepted / self.max_iterations_per_temp
            })
            
            # Cool down
            self.temperature *= self.cooling_rate
            iteration += 1
        
        return self.best_solution, self.best_energy
    
    def get_statistics(self):
        return {
            'initial_temperature': self.initial_temp,
            'final_temperature': self.temperature,
            'iterations': len(self.history),
            'best_energy': self.best_energy,
            'improvements': sum(1 for h in self.history 
                              if h['current_energy'] < h['best_energy'])
        }

Variants and Extensions

Fast Simulated Annealing

Uses Cauchy-Lorentz distribution for larger jumps:

def fast_neighbor(solution, temperature):
    """Generate neighbor using heavy-tailed distribution."""
    new_solution = []
    for x in solution:
        # Cauchy-Lorentz distribution
        u = random.random() - 0.5
        delta = temperature * math.tan(math.pi * u)
        new_solution.append(x + delta)
    return new_solution

Parallel Tempering (Replica Exchange)

Multiple SA runs at different temperatures exchange solutions:

class ParallelTempering:
    def __init__(self, num_replicas, objective, neighbor_fn):
        self.num_replicas = num_replicas
        self.replicas = []
        
        # Initialize replicas at different temperatures
        for i in range(num_replicas):
            temp = initial_temp * (cooling_rate ** i)
            self.replicas.append(SimulatedAnnealing(objective, neighbor_fn, temp))
    
    def exchange_solutions(self, i, j):
        """Exchange solutions between replicas with probability."""
        delta = (1/self.replicas[i].temperature - 1/self.replicas[j].temperature) * \
                (self.replicas[i].current_energy - self.replicas[j].current_energy)
        
        if metropolis_criterion(-delta, 1):  # Use reference temperature
            self.replicas[i].current_solution, self.replicas[j].current_solution = \
                self.replicas[j].current_solution, self.replicas[i].current_solution

Threshold Accepting

Simplified variant that only accepts improvements above a threshold:

def threshold_accepting(objective, neighbor_fn, initial_solution, threshold_schedule):
    current = initial_solution
    current_energy = objective(current)
    best = current
    best_energy = current_energy
    
    for threshold in threshold_schedule:
        candidate = neighbor_fn(current)
        candidate_energy = objective(candidate)
        
        if candidate_energy - current_energy < threshold:
            current = candidate
            current_energy = candidate_energy
            
            if current_energy < best_energy:
                best = current
                best_energy = current_energy
    
    return best, best_energy

Quantum Annealing

Uses quantum tunneling effects (specialized hardware):

# Note: Quantum annealing requires specialized hardware (D-Wave)
# This is a conceptual example of transverse-field Ising model

class QuantumAnnealing:
    def __init__(self, problem_hamiltonian):
        self.H = problem_hamiltonian  # Problem Hamiltonian
        self.quantum_term = None  # Transverse field term
    
    def solve(self, annealing_time, num_slots):
        """
        Simulated quantum annealing (simplified)
        
        In practice, this would use actual quantum hardware
        """
        # Initialize in superposition
        state = self._uniform_superposition()
        
        # Gradually introduce problem Hamiltonian
        for s in range(num_slots):
            A(s) = annealing_time * (1 - s/num_slots)
            B(s) = annealing_time * (s/num_slots)
            
            # Apply quantum operations
            state = self._apply_annealing_step(state, A(s), B(s))
        
        return self._measure(state)

Practical Applications

Traveling Salesman Problem

def tsp_objective(route, distance_matrix):
    """Total distance for TSP route."""
    total = 0
    for i in range(len(route) - 1):
        total += distance_matrix[route[i]][route[i+1]]
    total += distance_matrix[route[-1]][route[0]]
    return total

def solve_tsp_sa(distance_matrix):
    n = len(distance_matrix)
    initial = list(range(n))
    random.shuffle(initial)
    
    sa = SimulatedAnnealing(
        objective_function=lambda r: tsp_objective(r, distance_matrix),
        neighbor_function=swap_neighbor,
        initial_solution=initial,
        initial_temp=10000,
        cooling_rate=0.9999,
        max_iterations_per_temp=1000
    )
    
    return sa.run()

Job Shop Scheduling

def job_shop_objective(schedule, jobs, machines):
    """Makespan minimization."""
    machine_times = [0] * len(machines)
    
    for operation in schedule:
        job_id, machine_id = operation['job'], operation['machine']
        processing_time = jobs[job_id]['operations'][operation['op_id']]['time']
        
        machine_times[machine_id] += processing_time
    
    return max(machine_times)

def job_shop_neighbor(schedule):
    """Swap two operations on same machine."""
    new_schedule = [op.copy() for op in schedule]
    i, j = random.sample(range(len(new_schedule)), 2)
    new_schedule[i], new_schedule[j] = new_schedule[j], new_schedule[i]
    return new_schedule

Image Reconstruction

def image_restoration(objective_image, degraded_image):
    """Restore image using simulated annealing."""
    
    def objective(pixels):
        # Data fidelity + smoothness regularization
        fidelity = np.sum((pixels - degraded_image) ** 2)
        smoothness = np.sum(np.abs(np.diff(pixels)))
        return fidelity + 0.1 * smoothness
    
    def neighbor(image):
        # Add noise to random pixel
        new_image = image.copy()
        i, j = random.randint(0, image.shape[0]-1), random.randint(0, image.shape[1]-1)
        new_image[i, j] += random.gauss(0, 10)
        return new_image
    
    return simulated_annealing(objective, degraded_image, neighbor)

Neural Network Weight Optimization

def optimize_nn_weights_sa(model, X_train, y_train):
    """Optimize neural network weights using SA."""
    
    def objective(weights):
        model.set_weights(weights)
        predictions = model.predict(X_train)
        return np.mean((predictions - y_train) ** 2)
    
    # Flatten initial weights
    initial_weights = np.concatenate([w.flatten() for w in model.get_weights()])
    
    def neighbor(weights):
        new_weights = weights.copy()
        # Perturb random weight
        i = random.randint(0, len(new_weights)-1)
        new_weights[i] += random.gauss(0, 0.1)
        return new_weights
    
    best_weights, best_error = simulated_annealing(
        objective, initial_weights, neighbor,
        initial_temp=1.0, cooling_rate=0.999
    )
    
    return best_weights

Parameter Selection Guidelines

Temperature Range

Problem Type	Initial Temp	Final Temp
Combinatorial	100-10000	0.001-0.1
Continuous	1-100	0.0001-0.01

Cooling Rate

0.80-0.99: Slower cooling, better solutions
0.95-0.999: Standard rates
>0.999: Very slow, computationally expensive

Iterations per Temperature

Higher for complex landscapes
Balance between exploration and computation time
Typical range: 10-1000

Conclusion

Simulated annealing remains a powerful and elegant optimization technique. Its key strengths include:

Theoretical Foundation: Provides guaranteed convergence under appropriate cooling schedules
Simplicity: Easy to implement and understand
Flexibility: Works with any neighbor function and objective
Global Search: Escapes local optima through probabilistic acceptance

While convergence can be slow compared to other methods, SA excels when:

The problem landscape has many local optima
Gradient information is unavailable
A good solution quickly is more valuable than the best solution slowly
The problem is combinatorial rather than continuous

The algorithm’s enduring relevance in 2026 reflects its fundamental effectiveness and the universal applicability of its core insight: sometimes, to find the best solution, you must be willing to accept worse ones temporarily.