Skip to main content

Privacy-Preserving Machine Learning: Techniques and Implementation

Created: March 11, 2026 CalmOps 7 min read

Introduction

Privacy concerns in machine learning have become paramount as organizations handle increasingly sensitive data. Regulations like GDPR, CCPA, and HIPAA require careful handling of personal information. Privacy-preserving machine learning (PPML) techniques enable organizations to extract value from data while protecting individual privacy.

This comprehensive guide explores PPML techniques, their implementation, and real-world applications.

Understanding Privacy-Preserving ML

Why Privacy Matters

Traditional machine learning requires centralized data collection, creating privacy risks:

  1. Data Breaches - Centralized data stores are attractive targets
  2. Regulatory Compliance - GDPR, CCPA, HIPAA impose strict requirements
  3. User Trust - Privacy violations damage trust and reputation
  4. Data Silos - Privacy concerns prevent data sharing

PPML Techniques Overview

Technique Use Case Complexity Overhead
Federated Learning Distributed training Medium Low
Differential Privacy Noise-based privacy Low Medium
Secure Multi-Party Computation Secure collaboration High High
Homomorphic Encryption Computation on encrypted data Very High Very High
Split Learning Vertical/horizontal partitioning Medium Medium

Federated Learning

Concept

Federated learning trains models across decentralized data sources without sharing raw data.

┌─────────┐     ┌─────────┐     ┌─────────┐
│ Device  │     │ Device  │     │ Device  │
│   A     │     │   B     │     │   C     │
│ [Data]  │     │ [Data]  │     │ [Data]  │
└────┬────┘     └────┬────┘     └────┬────┘
     │                  │                  │
     ▼                  ▼                  ▼
  ┌─────────────────────────────────────┐
  │         Local Model Training         │
  │    (Gradient updates only)          │
  └────────────────┬────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│           Aggregation Server          │
│      (FedAvg, FedProx, etc.)        │
└────────────────┬────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│        Global Model Update           │
└─────────────────────────────────────┘

Implementation

import torch
import torch.nn as nn
from torchvision import datasets, transforms
from collections import OrderedDict

class FederatedClient:
    def __init__(self, model, client_id, data_loader):
        self.model = model
        self.client_id = client_id
        self.data_loader = data_loader
        self.criterion = nn.CrossEntropyLoss()
        self.optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    
    def local_train(self, epochs=5):
        """Train model locally on client data."""
        self.model.train()
        for epoch in range(epochs):
            for images, labels in self.data_loader:
                self.optimizer.zero_grad()
                outputs = self.model(images)
                loss = self.criterion(outputs, labels)
                loss.backward()
                self.optimizer.step()
        
        return self.get_model_update()
    
    def get_model_update(self):
        """Return model weights (not data)."""
        return OrderedDict(self.model.named_parameters())
    
    def set_model_weights(self, weights):
        """Apply received global model weights."""
        self.model.load_state_dict(weights)

class FederatedServer:
    def __init__(self, model):
        self.global_model = model
        self.clients = []
    
    def add_client(self, client):
        self.clients.append(client)
    
    def aggregate(self, client_updates, weights=None):
        """Federated Averaging (FedAvg)."""
        if weights is None:
            weights = [1.0 / len(client_updates)] * len(client_updates)
        
        aggregated = OrderedDict()
        
        for key in client_updates[0].keys():
            aggregated[key] = sum(
                w * update[key] 
                for w, update in zip(weights, client_updates)
            ) / sum(weights)
        
        self.global_model.load_state_dict(aggregated)
        return aggregated
    
    def train_round(self, epochs=5):
        """Execute one round of federated training."""
        # Each client trains locally
        client_updates = []
        client_weights = []
        
        for client in self.clients:
            client.set_model_weights(
                self.global_model.state_dict()
            )
            update = client.local_train(epochs=epochs)
            client_updates.append(update)
            # Weight by data size
            client_weights.append(len(client.data_loader.dataset))
        
        # Aggregate updates
        aggregated = self.aggregate(client_updates, client_weights)
        
        return aggregated

# Usage
global_model = SimpleNeuralNetwork()
server = FederatedServer(global_model)

# Add clients (each with local data)
for i in range(10):
    client_data = get_client_data(i)  # Local data
    client = FederatedClient(
        SimpleNeuralNetwork(), 
        f"client_{i}", 
        client_data
    )
    server.add_client(client)

# Train federated
for round in range(100):
    server.train_round(epochs=5)

Differential Privacy Integration

import numpy as np

class DPFederatedClient(FederatedClient):
    def __init__(self, model, client_id, data_loader, epsilon=1.0):
        super().__init__(model, client_id, data_loader)
        self.epsilon = epsilon
    
    def add_noise_to_gradient(self, gradient, sensitivity=1.0):
        """Add Gaussian noise for differential privacy."""
        scale = sensitivity * np.sqrt(2 * np.log(1.25 / self.epsilon))
        noise = np.random.normal(0, scale, gradient.shape)
        return gradient + noise
    
    def local_train(self, epochs=5):
        self.model.train()
        
        for epoch in range(epochs):
            for images, labels in self.data_loader:
                self.optimizer.zero_grad()
                outputs = self.model(images)
                loss = self.criterion(outputs, labels)
                loss.backward()
                
                # Clip gradients and add noise
                with torch.no_grad():
                    for param in self.model.parameters():
                        if param.grad is not None:
                            # Clip
                            torch.clamp(
                                param.grad, 
                                min=-1.0, 
                                max=1.0
                            )
                            # Add noise
                            param.grad += torch.from_numpy(
                                np.random.normal(0, 0.1, param.grad.shape)
                            ).float()
                
                self.optimizer.step()
        
        return self.get_model_update()

Differential Privacy

Concept

Differential privacy adds calibrated noise to data or results to provide mathematical privacy guarantees.

import numpy as np

class DifferentialPrivacy:
    def __init__(self, epsilon=1.0, delta=1e-5):
        self.epsilon = epsilon
        self.delta = delta
    
    def laplace_mechanism(self, true_value, sensitivity):
        """Add Laplace noise for differential privacy."""
        scale = sensitivity / self.epsilon
        noise = np.random.laplace(0, scale)
        return true_value + noise
    
    def gaussian_mechanism(self, true_value, sensitivity):
        """Add Gaussian noise for (ε, δ)-differential privacy."""
        scale = sensitivity * np.sqrt(2 * np.log(1.25 / self.delta)) / self.epsilon
        noise = np.random.normal(0, scale)
        return true_value + noise
    
    def exponential_mechanism(self, candidates, utility_fn, sensitivity):
        """Select from candidates with exponential mechanism."""
        utilities = [utility_fn(c) for c in candidates]
        exp_utils = np.exp(self.epsilon * utilities / (2 * sensitivity))
        probs = exp_utils / exp_utils.sum()
        return np.random.choice(candidates, p=probs)

Privacy Budget

class PrivacyBudget:
    def __init__(self, initial_epsilon=10.0):
        self.initial_epsilon = initial_epsilon
        self.spent = 0
    
    def consume(self, epsilon):
        """Consume privacy budget."""
        self.spent += epsilon
    
    def remaining(self):
        """Return remaining privacy budget."""
        return max(0, self.initial_epsilon - self.spent)
    
    def is_exhausted(self):
        return self.spent >= self.initial_epsilon

Secure Multi-Party Computation

Concept

SMPC enables multiple parties to jointly compute a function while keeping inputs private.

class SecureMultiPartyComputation:
    """Simplified SMPC using secret sharing."""
    
    @staticmethod
    def share_secret(secret, num_shares):
        """Split secret into shares using additive secret sharing."""
        import random
        shares = [random.randint(0, 100) for _ in range(num_shares - 1)]
        shares.append((secret - sum(shares)) % 100)
        return shares
    
    @staticmethod
    def reconstruct(shares):
        """Reconstruct secret from shares."""
        return sum(shares) % 100
    
    @staticmethod
    def add_secure(share1, share2):
        """Add two shared values without revealing."""
        return [(a + b) % 100 for a, b in zip(share1, share2)]
    
    @staticmethod
    def multiply_secure(share1, share2):
        """Multiply two shared values (requires Beaver triples)."""
        # Simplified - real implementation needs Beaver triples
        product = share1[0] * share2[0]
        return [product] * len(share1)

Secure Aggregation

class SecureAggregation:
    """Secure aggregation for federated learning."""
    
    def __init__(self, threshold=3):
        self.threshold = threshold  # Minimum participants
    
    def mask_update(self, update, client_id, seed):
        """Mask local update with client-specific noise."""
        np.random.seed(seed + client_id)
        mask = np.random.randn(*update.shape)
        masked = update + mask
        return masked, mask
    
    def aggregate(self, masked_updates, seeds):
        """Aggregate masked updates (masks cancel out)."""
        # With proper implementation, masks cancel out
        return sum(masked_updates) / len(masked_updates)

Homomorphic Encryption

Concept

Homomorphic encryption allows computations on encrypted data.

# Using TenSEAL library for CKKS scheme
import tenseal as ts

class HomomorphicEncryption:
    def __init__(self, poly_size=8192, scale=2**40):
        self.poly_size = poly_size
        self.scale = scale
    
    def create_context(self):
        """Create TenSEAL context."""
        context = ts.context(
            ts.SCHEME_TYPE.CKKS,
            poly_size=self.poly_size,
            scale=self.scale
        )
        context.generate_galois_keys()
        return context
    
    def encrypt_vector(self, context, values):
        """Encrypt a vector of values."""
        return ts.ckks_vector(context, values)
    
    def compute_on_encrypted(self, enc_vector):
        """Perform computation on encrypted vector."""
        # Example: multiply by 2 and add 1
        result = enc_vector * 2
        result = result + 1
        return result
    
    def decrypt(self, encrypted):
        """Decrypt result."""
        return encrypted.decrypt()

Split Learning

Concept

Split learning splits neural networks between clients and server.

class SplitNetwork:
    """Client and server portions of split neural network."""
    
    class ClientPart(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(3, 16, 3)
            self.relu = nn.ReLU()
        
        def forward(self, x):
            x = self.conv(x)
            x = self.relu(x)
            return x  # Send to server
    
    class ServerPart(nn.Module):
        def __init__(self):
            super().__init__()
            self.pool = nn.AdaptiveAvgPool2d((1, 1))
            self.fc = nn.Linear(16, 10)
        
        def forward(self, x):
            x = self.pool(x)
            x = x.view(x.size(0), -1)
            x = self.fc(x)
            return x

class SplitClient:
    def __init__(self):
        self.model = SplitNetwork.ClientPart()
    
    def forward(self, x):
        """Get intermediate activations."""
        return self.model(x)
    
    def backward(self, grad):
        """Backpropagate gradients."""
        grad = self.model.relu.backward(grad)
        grad = self.model.conv.backward(grad)
        return grad

class SplitServer:
    def __init__(self):
        self.model = SplitNetwork.ServerPart()
    
    def forward(self, activations):
        """Compute on client activations."""
        output = self.model(activations)
        loss = nn.CrossEntropyLoss()(output, torch.tensor([0]))
        loss.backward()
        return output, activations.grad

Implementation Best Practices

Architecture Selection

Choose techniques based on your requirements:

  1. Federated Learning: When data is distributed across devices
  2. Differential Privacy: When statistical queries are needed
  3. SMPC: When multiple parties need to collaborate
  4. Homomorphic Encryption: When computation on encrypted data is required

Privacy Budget Management

class PrivacyAccountant:
    def __init__(self, target_epsilon=8.0):
        self.target_epsilon = target_epsilon
        self.spent = 0.0
        self.history = []
    
    def step(self, noise_multiplier=1.1, sample_rate=0.01):
        """Account for privacy spend in one training step."""
        # RDP accounting (simplified)
        epsilon = noise_multiplier ** 2 / (2 * sample_rate)
        self.spent += epsilon
        self.history.append(self.spent)
        
        if self.spent > self.target_epsilon:
            raise PrivacyBudgetExhausted(
                f"Privacy budget exhausted: {self.spent}/{self.target_epsilon}"
            )
        
        return self.spent

External Resources

Conclusion

Privacy-preserving machine learning techniques enable organizations to build AI systems while protecting individual privacy. Each technique has strengths and trade-offs:

  • Federated Learning: Best for distributed data scenarios
  • Differential Privacy: Best for statistical analysis
  • SMPC: Best for multi-party collaboration
  • Homomorphic Encryption: Best for sensitive computation

By understanding these techniques and their trade-offs, you can build AI systems that respect privacy while delivering value.

Comments

Share this article

Scan to read on mobile