Skip to main content
โšก Calmops

Differential Privacy in Machine Learning

Introduction

Machine learning models often memorize sensitive training data, creating privacy risks. Differential Privacy (DP) provides mathematical guarantees that individual data points cannot be inferred from model outputs. In 2026, DP has become essential for responsible AI development, mandated by regulations like GDPR and adopted by major tech companies.

This article explores differential privacy fundamentals, algorithms for privacy-preserving ML, practical implementation, and real-world applications.

Fundamentals of Differential Privacy

The Core Definition

Differential Privacy guarantees that an algorithm’s output is approximately the same whether or not any individual participates in the dataset.

Definition: A randomized algorithm M with domain D and range S satisfies ฮต-differential privacy if for all datasets Dโ‚, Dโ‚‚ differing in one element, and all subsets S โІ S:

Pr[M(Dโ‚) โˆˆ S] โ‰ค e^ฮต ร— Pr[M(Dโ‚‚) โˆˆ S]

The privacy budget ฮต (epsilon) controls the privacy-utility tradeoff:

  • Lower ฮต: Stronger privacy, more noise, lower utility
  • Higher ฮต: Weaker privacy, less noise, higher utility

Key Concepts

import numpy as np

def laplace_mechanism(true_answer, sensitivity, epsilon):
    """
    Add Laplace noise to achieve differential privacy.
    
    Args:
        true_answer: The actual query result
        sensitivity: Maximum change from one person's data
        epsilon: Privacy budget
    
    Returns:
        Privatized answer
    """
    scale = sensitivity / epsilon
    noise = np.random.laplace(0, scale)
    return true_answer + noise


def gaussian_mechanism(true_answer, sensitivity, epsilon, delta=1e-5):
    """
    Add Gaussian noise (for approximate DP).
    """
    sigma = np.sqrt(2 * np.log(1.25 / delta)) * sensitivity / epsilon
    noise = np.random.normal(0, sigma)
    return true_answer + noise

Sensitivity

Global sensitivity: Maximum change in output when one record changes:

def global_sensitivity(query_function, dataset):
    """
    Calculate global sensitivity of a query.
    """
    max_change = 0
    
    for i in range(len(dataset)):
        dataset_with = dataset[:i] + dataset[i:]
        dataset_without = dataset[:i] + dataset[i+1:]
        
        change = abs(query_function(dataset_with) - query_function(dataset_without))
        max_change = max(max_change, change)
    
    return max_change

Differential Privacy in Machine Learning

Why ML Needs DP

Machine learning models can:

  1. Memorize training data - Directly encode specific records
  2. Leak information through model outputs or gradients
  3. Enable membership inference - Determine if a record was in training data

DP provides mathematical guarantees against these attacks.

DP Stochastic Gradient Descent (DP-SGD)

The core algorithm for privacy-preserving deep learning:

import torch
import torch.nn as nn
import numpy as np

class DPOptimizer:
    def __init__(self, model, lr=0.01, epsilon=1.0, delta=1e-5, 
                 max_grad_norm=1.0, noise_multiplier=1.0):
        self.model = model
        self.lr = lr
        self.epsilon = epsilon
        self.delta = delta
        self.max_grad_norm = max_grad_norm
        self.noise_multiplier = noise_multiplier
        
        self.noise_std = noise_multiplier * max_grad_norm / epsilon
        
        # For accounting
        self.steps = 0
        self.privacy_budget = (epsilon, delta)
    
    def step(self):
        """Perform one DP-SGD step."""
        self.steps += 1
        
        for name, param in self.model.named_parameters():
            if param.grad is None:
                continue
            
            # Clip gradients
            grad_norm = param.grad.norm()
            clip_factor = min(1.0, self.max_grad_norm / (grad_norm + 1e-6))
            param.grad *= clip_factor
            
            # Add noise
            noise = torch.normal(0, self.noise_std, param.grad.shape)
            param.grad += noise
            
            # Update parameters
            param.data -= self.lr * param.grad
    
    def get_privacy_spent(self):
        """Calculate privacy budget spent."""
        # Simplified - use proper accounting in practice
        return self.epsilon, self.delta


def train_with_dp(model, train_loader, epochs, epsilon=1.0):
    """Train model with differential privacy."""
    
    optimizer = DPOptimizer(model, epsilon=epsilon)
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(epochs):
        for batch_idx, (data, target) in enumerate(train_loader):
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
        
        eps, delta = optimizer.get_privacy_spent()
        print(f"Epoch {epoch}: ฮต={eps:.2f}, ฮด={delta}")

Complete DP-SGD Implementation

import torch
from torch.utils.data import DataLoader
import numpy as np

class DPSGD(torch.optim.Optimizer):
    def __init__(self, params, lr, noise_multiplier, max_grad_norm, l2_norm_clip=1.0):
        defaults = dict(lr=lr, noise_multiplier=noise_multiplier, 
                      max_grad_norm=max_grad_norm, l2_norm_clip=l2_norm_clip)
        super(DPSGD, self).__init__(params, defaults)
    
    def step(self, closure=None):
        loss = None
        if closure is not None:
            loss = closure()
        
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                
                grad = p.grad
                
                # Compute gradient norm for this layer
                grad_norm = grad.norm().item()
                
                # Clip gradient
                clip_factor = group['l2_norm_clip'] / (grad_norm + 1e-6)
                if clip_factor < 1:
                    grad = grad * clip_factor
                
                # Add noise
                noise_std = group['noise_multiplier'] * group['l2_norm_clip']
                noise = torch.normal(0, noise_std, grad.shape)
                grad = grad + noise.to(grad.device)
                
                # Update
                p.data.add_(grad, alpha=-group['lr'])
        
        return loss


def compute_privacy_budget(n_examples, batch_size, epochs, noise_multiplier, delta=1e-5):
    """
    Compute (ฮต, ฮด) privacy budget using RDP accounting.
    """
    # Simplified - use Opacus or PyTorch DP for proper accounting
    q = batch_size / n_examples
    steps = epochs * (n_examples // batch_size)
    
    # Approximate composition
    sigma = noise_multiplier
    epsilon = (q * sigma * np.sqrt(steps * 2 * np.log(1/delta))) / (1 + q)
    
    return epsilon, delta

Using PyTorch Opacus

from opacus import PrivacyEngine
from opacus.utils.batch_split import uniform_batch_sampler

def train_with_opacus(model, train_loader, optimizer, epochs, epsilon_target):
    """Train with DP using Opacus library."""
    
    privacy_engine = PrivacyEngine(
        model=model,
        sample_rate=1 / len(train_loader),
        alphas=[1 + x / 10.0 for x in range(1, 100)],
        noise_multiplier=1.0,
        max_grad_norm=1.0,
    )
    
    privacy_engine.attach(optimizer)
    
    for epoch in range(epochs):
        for batch in train_loader:
            optimizer.zero_grad()
            # Forward pass
            loss = model(batch['input'])
            # Backward pass
            loss.backward()
            optimizer.step()
        
        # Check privacy budget
        epsilon = optimizer.privacy_engine.get_epsilon(delta=1e-5)
        
        if epsilon > epsilon_target:
            print(f"Privacy budget exhausted: ฮต={epsilon:.2f}")
            break
        
        print(f"Epoch {epoch}: ฮต={epsilon:.2f}")

Privacy Accounting

Rรฉnyi Differential Privacy (RDP)

def rdp_accounting(q, sigma, steps, alpha=10):
    """
    Compute privacy loss using Rรฉnyi DP.
    
    Args:
        q: Sampling ratio
        sigma: Noise multiplier
        steps: Number of steps
        alpha: Order of Rรฉnyi divergence
    """
    # For Gaussian mechanism with sampling
    if q == 0:
        return 0
    
    # RDP for Gaussian mechanism
    log_moment = alpha / (2 * sigma**2)
    
    # Composition
    return steps * log_moment


def compose_dp(eps1, delta1, eps2, delta2):
    """Composition of two DP mechanisms."""
    return eps1 + eps2, delta1 + delta2

Privacy Budget Management

class PrivacyBudget:
    def __init__(self, initial_epsilon, delta=1e-5):
        self.epsilon = initial_epsilon
        self.delta = delta
        self.spent = 0
    
    def spend(self, cost):
        """Spend privacy budget."""
        self.spent += cost
        return self.epsilon - self.spent > 0
    
    def remaining(self):
        """Check remaining budget."""
        return max(0, self.epsilon - self.spent)

Practical Implementation

DP Linear Regression

import numpy as np

class DPRegression:
    def __init__(self, epsilon=1.0, delta=1e-5):
        self.epsilon = epsilon
        self.delta = delta
        self.coef_ = None
    
    def fit(self, X, y):
        """Fit linear regression with DP."""
        n_samples, n_features = X.shape
        
        # Compute sensitivity
        # For linear regression: sensitivity depends on data range
        X_max = np.max(np.abs(X), axis=0)
        sensitivity = np.sqrt(n_features) * np.max(X_max)
        
        # Compute coefficients with noise
        # Using gradient descent with DP
        coef = np.zeros(n_features)
        
        for _ in range(100):
            # Gradient
            pred = X @ coef
            grad = X.T @ (pred - y) / n_samples
            
            # Clip gradient
            grad_norm = np.linalg.norm(grad)
            if grad_norm > 1.0:
                grad = grad / grad_norm
            
            # Add noise
            noise = np.random.laplace(0, sensitivity / self.epsilon, n_features)
            grad += noise
            
            # Update
            coef -= 0.01 * grad
        
        self.coef_ = coef
        return self
    
    def predict(self, X):
        return X @ self.coef_

DP k-Means Clustering

def dp_kmeans(X, k, epsilon=1.0, n_iterations=100):
    """
    Differentially private k-means clustering.
    """
    n_samples, n_features = X.shape
    
    # Initialize centroids randomly with noise
    centroids = X[np.random.choice(n_samples, k, replace=False)]
    centroids += np.random.laplace(0, 1.0/epsilon, (k, n_features))
    
    for _ in range(n_iterations):
        # Assign points to nearest centroid
        distances = np.linalg.norm(X[:, None] - centroids[None, :], axis=2)
        labels = np.argmin(distances, axis=1)
        
        # Compute new centroids with noise
        new_centroids = []
        
        for i in range(k):
            cluster_points = X[labels == i]
            
            if len(cluster_points) > 0:
                # True centroid
                centroid = cluster_points.mean(axis=0)
                
                # Add noise based on cluster size
                noise_scale = 2 * k / (epsilon * len(cluster_points))
                centroid += np.random.laplace(0, noise_scale, n_features)
                
                new_centroids.append(centroid)
            else:
                # Empty cluster - keep old centroid
                new_centroids.append(centroids[i])
        
        centroids = np.array(new_centroids)
    
    return centroids, labels

DP Neural Network Training

import torch
import torch.nn as nn

class DPTrainingExample(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x


def train_dp_mnist():
    """Train a DP model on MNIST."""
    from torchvision import datasets, transforms
    from torch.utils.data import DataLoader
    
    # Load data
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    train_data = datasets.MNIST('./data', train=True, download=True, transform=transform)
    train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
    
    # Model
    model = DPTrainingExample()
    
    # DP optimizer
    optimizer = DPSGD(
        model.parameters(),
        lr=0.01,
        noise_multiplier=1.0,
        max_grad_norm=1.0
    )
    
    criterion = nn.CrossEntropyLoss()
    
    # Train
    for epoch in range(10):
        for batch in train_loader:
            data, target = batch
            
            optimizer.zero_grad()
            output = model(data.view(data.size(0), -1))
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
        
        eps = optimizer.privacy_engine.get_epsilon()
        print(f"Epoch {epoch}: ฮต={eps:.2f}")

Privacy vs Utility Tradeoff

Factors Affecting Utility

Factor Effect on Privacy Effect on Utility
Lower ฮต Stronger Worse
Larger batch size Weaker Better
More epochs Weaker Better
Higher noise Stronger Worse

Best Practices

  1. Start with high ฮต: Find maximum useful ฮต first
  2. Reduce gradually: Decrease ฮต until utility drops
  3. Use appropriate mechanisms: Gaussian for approximate DP, Laplace for pure DP
  4. Account for composition: Track total budget spent across operations
  5. Consider batch size: Larger batches provide better utility

Advanced Topics

Private Aggregation of Teacher Ensembles (PATE)

class PATE:
    """
    Train models on different data partitions and aggregate.
    """
    def __init__(self, num_teachers, epsilon):
        self.num_teachers = num_teachers
        self.epsilon = epsilon
        self.models = []
    
    def train_teachers(self, data_partitions):
        """Train separate models on each partition."""
        for partition in data_partitions:
            model = train_model(partition)
            self.models.append(model)
    
    def predict(self, x):
        """Aggregate predictions with DP."""
        votes = [model.predict(x) for model in self.models]
        
        # Add noise to vote counts
        noisy_votes = votes + np.random.laplace(0, 1/self.epsilon, len(votes))
        
        return np.argmax(noisy_votes)

Local Differential Privacy

def local_dp_mechanism(x, epsilon):
    """
    Randomized response for local DP.
    
    Used when data collector itself is not trusted.
    """
    # With probability e^epsilon / (e^epsilon + 1), report true value
    # Otherwise, report random alternative
    
    p_true = np.exp(epsilon) / (np.exp(epsilon) + 1)
    
    if np.random.random() < p_true:
        return x
    else:
        # Return random other value
        return np.random.choice([v for v in range(10) if v != x])

Real-World Applications

Industry Deployment

  • Google: RAPPOR for Chrome telemetry
  • Apple: Differential Privacy for emoji suggestions
  • Microsoft: Privacy-preserving telemetry
  • Uber: DP for location data

Healthcare

def dp_medical_analysis(patient_records, epsilon=1.0):
    """
    Analyze medical data while preserving privacy.
    """
    # Compute statistics with DP
    mean_age = add_laplace_noise(
        patient_records['age'].mean(),
        sensitivity=1,
        epsilon=epsilon/3
    )
    
    prevalence = add_laplace_noise(
        (patient_records['condition'] == 1).mean(),
        sensitivity=1,
        epsilon=epsilon/3
    )
    
    effect_size = add_laplace_noise(
        compute_effect_size(patient_records),
        sensitivity=1,
        epsilon=epsilon/3
    )
    
    return {
        'mean_age': mean_age,
        'prevalence': prevalence,
        'effect_size': effect_size
    }

Conclusion

Differential Privacy provides mathematical guarantees essential for responsible AI. Key takeaways:

  1. ฮต (epsilon) controls the privacy-utility tradeoff
  2. DP-SGD enables privacy-preserving deep learning
  3. Proper accounting tracks privacy budget spent
  4. Composition means privacy degrades with multiple operations

In 2026, DP is no longer optional for sensitive applications. Understanding these techniques enables building AI systems that respect privacy while delivering value.

Resources

Comments