Introduction
Machine learning models often memorize sensitive training data, creating privacy risks. Differential Privacy (DP) provides mathematical guarantees that individual data points cannot be inferred from model outputs. In 2026, DP has become essential for responsible AI development, mandated by regulations like GDPR and adopted by major tech companies.
This article explores differential privacy fundamentals, algorithms for privacy-preserving ML, practical implementation, and real-world applications.
Fundamentals of Differential Privacy
The Core Definition
Differential Privacy guarantees that an algorithm’s output is approximately the same whether or not any individual participates in the dataset.
Definition: A randomized algorithm M with domain D and range S satisfies ฮต-differential privacy if for all datasets Dโ, Dโ differing in one element, and all subsets S โ S:
Pr[M(Dโ) โ S] โค e^ฮต ร Pr[M(Dโ) โ S]
The privacy budget ฮต (epsilon) controls the privacy-utility tradeoff:
- Lower ฮต: Stronger privacy, more noise, lower utility
- Higher ฮต: Weaker privacy, less noise, higher utility
Key Concepts
import numpy as np
def laplace_mechanism(true_answer, sensitivity, epsilon):
"""
Add Laplace noise to achieve differential privacy.
Args:
true_answer: The actual query result
sensitivity: Maximum change from one person's data
epsilon: Privacy budget
Returns:
Privatized answer
"""
scale = sensitivity / epsilon
noise = np.random.laplace(0, scale)
return true_answer + noise
def gaussian_mechanism(true_answer, sensitivity, epsilon, delta=1e-5):
"""
Add Gaussian noise (for approximate DP).
"""
sigma = np.sqrt(2 * np.log(1.25 / delta)) * sensitivity / epsilon
noise = np.random.normal(0, sigma)
return true_answer + noise
Sensitivity
Global sensitivity: Maximum change in output when one record changes:
def global_sensitivity(query_function, dataset):
"""
Calculate global sensitivity of a query.
"""
max_change = 0
for i in range(len(dataset)):
dataset_with = dataset[:i] + dataset[i:]
dataset_without = dataset[:i] + dataset[i+1:]
change = abs(query_function(dataset_with) - query_function(dataset_without))
max_change = max(max_change, change)
return max_change
Differential Privacy in Machine Learning
Why ML Needs DP
Machine learning models can:
- Memorize training data - Directly encode specific records
- Leak information through model outputs or gradients
- Enable membership inference - Determine if a record was in training data
DP provides mathematical guarantees against these attacks.
DP Stochastic Gradient Descent (DP-SGD)
The core algorithm for privacy-preserving deep learning:
import torch
import torch.nn as nn
import numpy as np
class DPOptimizer:
def __init__(self, model, lr=0.01, epsilon=1.0, delta=1e-5,
max_grad_norm=1.0, noise_multiplier=1.0):
self.model = model
self.lr = lr
self.epsilon = epsilon
self.delta = delta
self.max_grad_norm = max_grad_norm
self.noise_multiplier = noise_multiplier
self.noise_std = noise_multiplier * max_grad_norm / epsilon
# For accounting
self.steps = 0
self.privacy_budget = (epsilon, delta)
def step(self):
"""Perform one DP-SGD step."""
self.steps += 1
for name, param in self.model.named_parameters():
if param.grad is None:
continue
# Clip gradients
grad_norm = param.grad.norm()
clip_factor = min(1.0, self.max_grad_norm / (grad_norm + 1e-6))
param.grad *= clip_factor
# Add noise
noise = torch.normal(0, self.noise_std, param.grad.shape)
param.grad += noise
# Update parameters
param.data -= self.lr * param.grad
def get_privacy_spent(self):
"""Calculate privacy budget spent."""
# Simplified - use proper accounting in practice
return self.epsilon, self.delta
def train_with_dp(model, train_loader, epochs, epsilon=1.0):
"""Train model with differential privacy."""
optimizer = DPOptimizer(model, epsilon=epsilon)
criterion = nn.CrossEntropyLoss()
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
eps, delta = optimizer.get_privacy_spent()
print(f"Epoch {epoch}: ฮต={eps:.2f}, ฮด={delta}")
Complete DP-SGD Implementation
import torch
from torch.utils.data import DataLoader
import numpy as np
class DPSGD(torch.optim.Optimizer):
def __init__(self, params, lr, noise_multiplier, max_grad_norm, l2_norm_clip=1.0):
defaults = dict(lr=lr, noise_multiplier=noise_multiplier,
max_grad_norm=max_grad_norm, l2_norm_clip=l2_norm_clip)
super(DPSGD, self).__init__(params, defaults)
def step(self, closure=None):
loss = None
if closure is not None:
loss = closure()
for group in self.param_groups:
for p in group['params']:
if p.grad is None:
continue
grad = p.grad
# Compute gradient norm for this layer
grad_norm = grad.norm().item()
# Clip gradient
clip_factor = group['l2_norm_clip'] / (grad_norm + 1e-6)
if clip_factor < 1:
grad = grad * clip_factor
# Add noise
noise_std = group['noise_multiplier'] * group['l2_norm_clip']
noise = torch.normal(0, noise_std, grad.shape)
grad = grad + noise.to(grad.device)
# Update
p.data.add_(grad, alpha=-group['lr'])
return loss
def compute_privacy_budget(n_examples, batch_size, epochs, noise_multiplier, delta=1e-5):
"""
Compute (ฮต, ฮด) privacy budget using RDP accounting.
"""
# Simplified - use Opacus or PyTorch DP for proper accounting
q = batch_size / n_examples
steps = epochs * (n_examples // batch_size)
# Approximate composition
sigma = noise_multiplier
epsilon = (q * sigma * np.sqrt(steps * 2 * np.log(1/delta))) / (1 + q)
return epsilon, delta
Using PyTorch Opacus
from opacus import PrivacyEngine
from opacus.utils.batch_split import uniform_batch_sampler
def train_with_opacus(model, train_loader, optimizer, epochs, epsilon_target):
"""Train with DP using Opacus library."""
privacy_engine = PrivacyEngine(
model=model,
sample_rate=1 / len(train_loader),
alphas=[1 + x / 10.0 for x in range(1, 100)],
noise_multiplier=1.0,
max_grad_norm=1.0,
)
privacy_engine.attach(optimizer)
for epoch in range(epochs):
for batch in train_loader:
optimizer.zero_grad()
# Forward pass
loss = model(batch['input'])
# Backward pass
loss.backward()
optimizer.step()
# Check privacy budget
epsilon = optimizer.privacy_engine.get_epsilon(delta=1e-5)
if epsilon > epsilon_target:
print(f"Privacy budget exhausted: ฮต={epsilon:.2f}")
break
print(f"Epoch {epoch}: ฮต={epsilon:.2f}")
Privacy Accounting
Rรฉnyi Differential Privacy (RDP)
def rdp_accounting(q, sigma, steps, alpha=10):
"""
Compute privacy loss using Rรฉnyi DP.
Args:
q: Sampling ratio
sigma: Noise multiplier
steps: Number of steps
alpha: Order of Rรฉnyi divergence
"""
# For Gaussian mechanism with sampling
if q == 0:
return 0
# RDP for Gaussian mechanism
log_moment = alpha / (2 * sigma**2)
# Composition
return steps * log_moment
def compose_dp(eps1, delta1, eps2, delta2):
"""Composition of two DP mechanisms."""
return eps1 + eps2, delta1 + delta2
Privacy Budget Management
class PrivacyBudget:
def __init__(self, initial_epsilon, delta=1e-5):
self.epsilon = initial_epsilon
self.delta = delta
self.spent = 0
def spend(self, cost):
"""Spend privacy budget."""
self.spent += cost
return self.epsilon - self.spent > 0
def remaining(self):
"""Check remaining budget."""
return max(0, self.epsilon - self.spent)
Practical Implementation
DP Linear Regression
import numpy as np
class DPRegression:
def __init__(self, epsilon=1.0, delta=1e-5):
self.epsilon = epsilon
self.delta = delta
self.coef_ = None
def fit(self, X, y):
"""Fit linear regression with DP."""
n_samples, n_features = X.shape
# Compute sensitivity
# For linear regression: sensitivity depends on data range
X_max = np.max(np.abs(X), axis=0)
sensitivity = np.sqrt(n_features) * np.max(X_max)
# Compute coefficients with noise
# Using gradient descent with DP
coef = np.zeros(n_features)
for _ in range(100):
# Gradient
pred = X @ coef
grad = X.T @ (pred - y) / n_samples
# Clip gradient
grad_norm = np.linalg.norm(grad)
if grad_norm > 1.0:
grad = grad / grad_norm
# Add noise
noise = np.random.laplace(0, sensitivity / self.epsilon, n_features)
grad += noise
# Update
coef -= 0.01 * grad
self.coef_ = coef
return self
def predict(self, X):
return X @ self.coef_
DP k-Means Clustering
def dp_kmeans(X, k, epsilon=1.0, n_iterations=100):
"""
Differentially private k-means clustering.
"""
n_samples, n_features = X.shape
# Initialize centroids randomly with noise
centroids = X[np.random.choice(n_samples, k, replace=False)]
centroids += np.random.laplace(0, 1.0/epsilon, (k, n_features))
for _ in range(n_iterations):
# Assign points to nearest centroid
distances = np.linalg.norm(X[:, None] - centroids[None, :], axis=2)
labels = np.argmin(distances, axis=1)
# Compute new centroids with noise
new_centroids = []
for i in range(k):
cluster_points = X[labels == i]
if len(cluster_points) > 0:
# True centroid
centroid = cluster_points.mean(axis=0)
# Add noise based on cluster size
noise_scale = 2 * k / (epsilon * len(cluster_points))
centroid += np.random.laplace(0, noise_scale, n_features)
new_centroids.append(centroid)
else:
# Empty cluster - keep old centroid
new_centroids.append(centroids[i])
centroids = np.array(new_centroids)
return centroids, labels
DP Neural Network Training
import torch
import torch.nn as nn
class DPTrainingExample(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
def train_dp_mnist():
"""Train a DP model on MNIST."""
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Load data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_data = datasets.MNIST('./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
# Model
model = DPTrainingExample()
# DP optimizer
optimizer = DPSGD(
model.parameters(),
lr=0.01,
noise_multiplier=1.0,
max_grad_norm=1.0
)
criterion = nn.CrossEntropyLoss()
# Train
for epoch in range(10):
for batch in train_loader:
data, target = batch
optimizer.zero_grad()
output = model(data.view(data.size(0), -1))
loss = criterion(output, target)
loss.backward()
optimizer.step()
eps = optimizer.privacy_engine.get_epsilon()
print(f"Epoch {epoch}: ฮต={eps:.2f}")
Privacy vs Utility Tradeoff
Factors Affecting Utility
| Factor | Effect on Privacy | Effect on Utility |
|---|---|---|
| Lower ฮต | Stronger | Worse |
| Larger batch size | Weaker | Better |
| More epochs | Weaker | Better |
| Higher noise | Stronger | Worse |
Best Practices
- Start with high ฮต: Find maximum useful ฮต first
- Reduce gradually: Decrease ฮต until utility drops
- Use appropriate mechanisms: Gaussian for approximate DP, Laplace for pure DP
- Account for composition: Track total budget spent across operations
- Consider batch size: Larger batches provide better utility
Advanced Topics
Private Aggregation of Teacher Ensembles (PATE)
class PATE:
"""
Train models on different data partitions and aggregate.
"""
def __init__(self, num_teachers, epsilon):
self.num_teachers = num_teachers
self.epsilon = epsilon
self.models = []
def train_teachers(self, data_partitions):
"""Train separate models on each partition."""
for partition in data_partitions:
model = train_model(partition)
self.models.append(model)
def predict(self, x):
"""Aggregate predictions with DP."""
votes = [model.predict(x) for model in self.models]
# Add noise to vote counts
noisy_votes = votes + np.random.laplace(0, 1/self.epsilon, len(votes))
return np.argmax(noisy_votes)
Local Differential Privacy
def local_dp_mechanism(x, epsilon):
"""
Randomized response for local DP.
Used when data collector itself is not trusted.
"""
# With probability e^epsilon / (e^epsilon + 1), report true value
# Otherwise, report random alternative
p_true = np.exp(epsilon) / (np.exp(epsilon) + 1)
if np.random.random() < p_true:
return x
else:
# Return random other value
return np.random.choice([v for v in range(10) if v != x])
Real-World Applications
Industry Deployment
- Google: RAPPOR for Chrome telemetry
- Apple: Differential Privacy for emoji suggestions
- Microsoft: Privacy-preserving telemetry
- Uber: DP for location data
Healthcare
def dp_medical_analysis(patient_records, epsilon=1.0):
"""
Analyze medical data while preserving privacy.
"""
# Compute statistics with DP
mean_age = add_laplace_noise(
patient_records['age'].mean(),
sensitivity=1,
epsilon=epsilon/3
)
prevalence = add_laplace_noise(
(patient_records['condition'] == 1).mean(),
sensitivity=1,
epsilon=epsilon/3
)
effect_size = add_laplace_noise(
compute_effect_size(patient_records),
sensitivity=1,
epsilon=epsilon/3
)
return {
'mean_age': mean_age,
'prevalence': prevalence,
'effect_size': effect_size
}
Conclusion
Differential Privacy provides mathematical guarantees essential for responsible AI. Key takeaways:
- ฮต (epsilon) controls the privacy-utility tradeoff
- DP-SGD enables privacy-preserving deep learning
- Proper accounting tracks privacy budget spent
- Composition means privacy degrades with multiple operations
In 2026, DP is no longer optional for sensitive applications. Understanding these techniques enables building AI systems that respect privacy while delivering value.
Resources
- The Algorithmic Foundations of Differential Privacy
- Deep Learning with Differential Privacy - Original DP-SGD paper
- Opacus: PyTorch DP Library
- Google DP Library
- IBM Diffprivlib
Comments