Fraud Detection: Machine Learning for Financial Crime 2026

Introduction

Financial fraud costs the global economy billions of dollars annually. From credit card scams to money laundering, financial institutions need sophisticated systems to detect and prevent fraudulent activities. Machine learning has revolutionized fraud detection, enabling real-time identification of suspicious patterns that would be impossible for humans to detect.

In this guide, we’ll explore the techniques, architectures, and best practices for building production-grade fraud detection systems.

Understanding Fraud Detection

Types of Financial Fraud

┌─────────────────────────────────────────────────────────────────────┐
│                    TYPES OF FINANCIAL FRAUD                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                 PAYMENT FRAUD                                  │  │
│  │                                                              │  │
│  │  • Credit Card Fraud - Stolen card usage                    │  │
│  │  • Card-Not-Present (CNP) - Online fraud                   │  │
│  │  • Account Takeover - Stolen credentials                   │  │
│  │  • Friendly Fraud - Chargebacks abuse                       │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                 IDENTITY FRAUD                              │  │
│  │                                                              │  │
│  │  • Synthetic Identity - Fake identity creation             │  │
│  │  • Identity Theft - Using stolen identity                  │  │
│  │  • Application Fraud - Fake loan/credit applications       │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                 MONEY LAUNDERING                            │  │
│  │                                                              │  │
│  │  • Structuring - Smurfing to avoid reporting               │  │
│  │  • Layering - Moving money through multiple accounts      │  │
│  │  • Integration - Making dirty money appear legitimate      │  │
│  │  • Trade-Based Money Laundering                           │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                 INSURANCE FRAUD                              │  │
│  │                                                              │  │
│  │  • Claims Fraud - Exaggerated/fake claims                  │  │
│  │  • Premium Diversion - Selling fake policies               │  │
│  │  • Vehicle Insurance Fraud                                 │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Fraud Detection Challenge

Fraud detection presents unique challenges:

Class Imbalance: Fraudulent transactions are rare (< 0.1%)
Real-Time Requirements: Need to approve/reject in milliseconds
Adaptive Adversaries: Fraudsters constantly evolve tactics
False Positive Costs: Declining legitimate customers is expensive
Concept Drift: Fraud patterns change over time

Feature Engineering

Feature Categories

class FraudFeatureEngineer:
    """
    Feature engineering for fraud detection
    """
    
    def create_transaction_features(self, transaction: dict, history: list) -> dict:
        """
        Create features from transaction and historical data
        """
        
        features = {}
        
        # ─────────────────────────────────────────────────────────
        # -1. TRANSACTIONLEVEL FEATURES
        # ─────────────────────────────────────────────────────────
        
        # Amount features
        features['amount'] = transaction['amount']
        features['amount_log'] = np.log1p(transaction['amount'])
        features['amount_squared'] = transaction['amount'] ** 2
        
        # Time-based features
        features['hour'] = transaction['timestamp'].hour
        features['day_of_week'] = transaction['timestamp'].weekday()
        features['is_weekend'] = transaction['timestamp'].weekday() >= 5
        features['is_night'] = (transaction['timestamp'].hour >= 22) | \
                               (transaction['timestamp'].hour <= 5)
        features['is_holiday'] = self._is_holiday(transaction['timestamp'])
        
        # Merchant features
        features['merchant_category'] = transaction['merchant_category']
        features['merchant_risk_score'] = self._get_merchant_risk(
            transaction['merchant_id']
        )
        
        # ─────────────────────────────────────────────────────────
        # 2. HISTORICAL BEHAVIOR FEATURES
        # ─────────────────────────────────────────────────────────
        
        customer_history = [h for h in history if h['customer_id'] == 
                          transaction['customer_id']]
        
        if customer_history:
            amounts = [h['amount'] for h in customer_history]
            
            # Velocity features
            features['avg_amount_30d'] = np.mean(amounts[-30:])
            features['std_amount_30d'] = np.std(amounts[-30:]) if len(amounts) > 1 else 0
            features['max_amount_30d'] = np.max(amounts[-30:])
            features['min_amount_30d'] = np.min(amounts[-30:])
            
            # Transaction frequency
            features['txn_count_1d'] = len([h for h in customer_history 
                                            if self._days_ago(h['timestamp']) <= 1])
            features['txn_count_7d'] = len([h for h in customer_history 
                                           if self._days_ago(h['timestamp']) <= 7])
            features['txn_count_30d'] = len([h for h in customer_history 
                                            if self._days_ago(h['timestamp']) <= 30])
            
            # Time since last transaction
            features['hours_since_last_txn'] = self._hours_since(
                customer_history[-1]['timestamp'] if customer_history else None,
                transaction['timestamp']
            )
        
        # ─────────────────────────────────────────────────────────
        # 3. CROSS-FEATURE FEATURES
        # ─────────────────────────────────────────────────────────
        
        # Amount relative to history
        if customer_history:
            avg_historical = features.get('avg_amount_30d', transaction['amount'])
            features['amount_vs_avg_ratio'] = transaction['amount'] / (avg_historical + 1)
            features['amount_vs_max_ratio'] = transaction['amount'] / (features['max_amount_30d'] + 1)
        
        # New merchant indicator
        features['is_new_merchant'] = self._is_new_merchant(
            transaction['customer_id'],
            transaction['merchant_id']
        )
        
        # Geographic features
        if 'location' in transaction:
            features['location'] = transaction['location']
            features['location_change_velocity'] = self._location_change_velocity(
                customer_history,
                transaction['location']
            )
        
        # ─────────────────────────────────────────────────────────
        # 4. NETWORK FEATURES
        # ─────────────────────────────────────────────────────────
        
        # Device features
        features['device_fingerprint'] = transaction.get('device_id')
        features['is_new_device'] = self._is_new_device(
            transaction['customer_id'],
            transaction.get('device_id')
        )
        features['device_count_7d'] = self._device_count(
            transaction['customer_id'],
            days=7
        )
        
        # IP features
        features['ip_address'] = transaction.get('ip_address')
        features['is_vpn'] = self._is_vpn(transaction.get('ip_address'))
        features['is_proxy'] = self._is_proxy(transaction.get('ip_address'))
        
        return features
    
    def _is_holiday(self, dt: datetime) -> bool:
        """Check if date is a holiday"""
        # Implementation
        pass
    
    def _get_merchant_risk(self, merchant_id: str) -> float:
        """Get merchant risk score"""
        pass
    
    def _days_ago(self, timestamp: datetime) -> float:
        """Calculate days since timestamp"""
        pass
    
    def _hours_since(self, last_timestamp: datetime, current: datetime) -> float:
        """Calculate hours since last transaction"""
        pass
    
    def _is_new_merchant(self, customer_id: str, merchant_id: str) -> bool:
        """Check if customer is new to merchant"""
        pass
    
    def _location_change_velocity(self, history: list, location: dict) -> float:
        """Calculate how fast location changed"""
        pass
    
    def _is_new_device(self, customer_id: str, device_id: str) -> bool:
        """Check if device is new for customer"""
        pass
    
    def _device_count(self, customer_id: str, days: int) -> int:
        """Count unique devices in time window"""
        pass
    
    def _is_vpn(self, ip_address: str) -> bool:
        """Check if IP is VPN"""
        pass
    
    def _is_proxy(self, ip_address: str) -> bool:
        """Check if IP is proxy"""
        pass

Advanced Feature Engineering

# Time-based aggregation features
def create_time_based_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Create time-based aggregated features
    """
    
    # Rolling statistics
    df['rolling_mean_3'] = df.groupby('customer_id')['amount'].transform(
        lambda x: x.rolling(3, min_periods=1).mean()
    )
    
    df['rolling_std_7'] = df.groupby('customer_id')['amount'].transform(
        lambda x: x.rolling(7, min_periods=1).std()
    )
    
    # Exponential moving average
    df['ewma_amount'] = df.groupby('customer_id')['amount'].transform(
        lambda x: x.ewm(span=10).mean()
    )
    
    # Lag features
    for lag in [1, 3, 7]:
        df[f'amount_lag_{lag}'] = df.groupby('customer_id')['amount'].shift(lag)
    
    # Difference features
    df['amount_diff_1'] = df['amount'] - df['amount_lag_1']
    df['amount_pct_change'] = df['amount'].pct_change()
    
    return df


# Network/graph-based features
def create_network_features(transaction: dict, graph: nx.Graph) -> dict:
    """
    Create features based on transaction network
    """
    
    features = {}
    
    # Customer node features
    customer_node = transaction['customer_id']
    
    # Degree (number of connections)
    features['customer_degree'] = graph.degree(customer_node)
    
    # Number of fraud neighbors
    neighbors = list(graph.neighbors(customer_node))
    features['fraud_neighbor_count'] = sum(
        graph.nodes[n].get('is_fraud', 0) for n in neighbors
    )
    
    # Shortest path to known fraud
    try:
        fraud_nodes = [n for n in graph.nodes if graph.nodes[n].get('is_fraud')]
        shortest_path = nx.shortest_path_length(graph, customer_node, fraud_nodes[0])
        features['distance_to_fraud'] = shortest_path
    except:
        features['distance_to_fraud'] = -1
    
    return features

Model Selection

Algorithm Comparison

┌─────────────────────────────────────────────────────────────────────┐
│              FRAUD DETECTION ALGORITHMS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ALGORITHM              PROS                    CONS               │
│  ─────────────────────────────────────────────────────────────────  │
│                                                                     │
│  Random Forest         • Robust                • Can be slow       │
│                       • Handles imbalance    • Less interpretable │
│                       • Good accuracy                                │
│                                                                     │
│  XGBoost/LightGBM      • Fast training       • Requires tuning    │
│                       • Handles sparse data  • Can overfit        │
│                       • Good with imbalance                        │
│                                                                     │
│  Isolation Forest      • Unsupervised        • Hard to tune      │
│                       • Good for anomaly     • Less accurate      │
│                       • No labels needed                           │
│                                                                     │
│  Autoencoder           • Unsupervised        • Needs normalization│
│                       • Good for novel fraud  • Complex           │
│                       • Learns normal patterns                     │
│                                                                     │
│  Graph Neural Networks • Network features     • Complex           │
│                       • Catch organized fraud • Needs graph data   │
│                       • State-of-the-art                           │
│                                                                     │
│  Ensemble Methods      • Best overall        • Complex            │
│                       • Combines strengths   • Hard to deploy      │
│                       • Higher accuracy                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Model Implementation

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

class FraudDetectionModel:
    """
    XGBoost-based fraud detection model
    """
    
    def __init__(self, params: dict = None):
        self.params = params or {
            'objective': 'binary:logistic',
            'eval_metric': 'auc',
            'max_depth': 6,
            'learning_rate': 0.1,
            'subsample': 0.8,
            'colsample_bytree': 0.8,
            'scale_pos_weight': 100,  # Handle imbalance
            'tree_method': 'hist'
        }
        self.model = None
        self.feature_importance = None
    
    def train(self, X: pd.DataFrame, y: pd.Series):
        """
        Train the fraud detection model
        """
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=y
        )
        
        # Create DMatrix
        dtrain = xgb.DMatrix(X_train, label=y_train)
        dtest = xgb.DMatrix(X_test, label=y_test)
        
        # Train with early stopping
        evals = [(dtrain, 'train'), (dtest, 'eval')]
        
        self.model = xgb.train(
            self.params,
            dtrain,
            num_boost_round=500,
            evals=evals,
            early_stopping_rounds=50,
            verbose_eval=50
        )
        
        # Get feature importance
        importance = self.model.get_score(importance_type='gain')
        self.feature_importance = pd.DataFrame({
            'feature': list(importance.keys()),
            'importance': list(importance.values())
        }).sort_values('importance', ascending=False)
        
        # Evaluate
        y_pred_proba = self.model.predict(dtest)
        y_pred = (y_pred_proba > 0.5).astype(int)
        
        print("\nClassification Report:")
        print(classification_report(y_test, y_pred))
        
        print(f"\nAUC-ROC: {roc_auc_score(y_test, y_pred_proba):.4f}")
        
        return self.model
    
    def predict_proba(self, X: pd.DataFrame) -> np.array:
        """
        Predict fraud probability
        """
        
        dtest = xgb.DMatrix(X)
        return self.model.predict(dtest)
    
    def predict(self, X: pd.DataFrame, threshold: float = 0.5) -> np.array:
        """
        Predict fraud labels
        """
        
        proba = self.predict_proba(X)
        return (proba >= threshold).astype(int)

Handling Class Imbalance

from imblearn.over_sampling import SMOTE, ADASYN
from imblearn.under_sampling import RandomUnderSampler
from imblearn.combine import SMOTETomek

def handle_imbalance(X: pd.DataFrame, y: pd.Series, method: str = 'smote'):
    """
    Handle class imbalance with various techniques
    """
    
    if method == 'smote':
        sampler = SMOTE(random_state=42)
    elif method == 'adasyn':
        sampler = ADASYN(random_state=42)
    elif method == 'undersample':
        sampler = RandomUnderSampler(random_state=42)
    elif method == 'smote_tomek':
        sampler = SMOTETomek(random_state=42)
    else:
        raise ValueError(f"Unknown method: {method}")
    
    X_resampled, y_resampled = sampler.fit_resample(X, y)
    
    return X_resampled, y_resampled


# Alternative: Class weights
def train_with_class_weights(X: pd.DataFrame, y: pd.Series):
    """
    Train with class weights instead of resampling
    """
    
    # Calculate scale_pos_weight
    scale_pos_weight = (y == 0).sum() / (y == 1).sum()
    
    # Update params
    params = {
        'objective': 'binary:logistic',
        'scale_pos_weight': scale_pos_weight,
        # ... other params
    }
    
    model = xgb.train(params, dtrain, num_boost_round=100)
    return model

Real-Time Scoring

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│              REAL-TIME FRAUD DETECTION ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                    REQUEST FLOW                               │  │
│  │                                                              │  │
│  │  Customer ──► API Gateway ──► Fraud Check Service ──► DB   │  │
│  │                      │                                        │  │
│  │                      ▼                                        │  │
│  │              ┌──────────────┐                                 │  │
│  │              │ Feature      │                                 │  │
│  │              │ Store (Redis)│                                 │  │
│  │              └──────────────┘                                 │  │
│  │                      │                                        │  │
│  │                      ▼                                        │  │
│  │              ┌──────────────┐                                 │  │
│  │              │ ML Model     │                                 │  │
│  │              │ (XGBoost)    │                                 │  │
│  │              └──────────────┘                                 │  │
│  │                      │                                        │  │
│  │                      ▼                                        │  │
│  │              ┌──────────────┐                                 │  │
│  │              │ Decision     │                                 │  │
│  │              │ Engine       │                                 │  │
│  │              └──────────────┘                                 │  │
│  │                      │                                        │  │
│  │              ◄────────┴────────►                              │  │
│  │              │                │                               │  │
│  │              ▼                ▼                               │  │
│  │        Approve            Review/Decline                     │  │
│  │                                                              │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
│  Key Components:                                                    │
│  • Low-latency feature store                                       │
│  • Model inference < 50ms                                         │
│  • A/B testing for model updates                                  │
│  • Monitoring and alerting                                        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Implementation

import redis
import pickle
import time

class RealTimeFraudScorer:
    """
    Real-time fraud scoring service
    """
    
    def __init__(self, model_path: str, redis_host: str = 'localhost'):
        # Load model
        self.model = self._load_model(model_path)
        
        # Connect to Redis for feature cache
        self.redis = redis.Redis(host=redis_host, port=6379, db=0)
        
        # Feature store
        self.feature_store = FeatureStore(self.redis)
    
    def _load_model(self, path: str):
        """Load serialized model"""
        with open(path, 'rb') as f:
            return pickle.load(f)
    
    def score_transaction(self, transaction: dict) -> dict:
        """
        Score a transaction in real-time
        """
        
        start_time = time.time()
        
        # 1. Fetch features from cache or compute
        features = self._get_features(transaction)
        
        # 2. Run model inference
        fraud_probability = self._predict_fraud(features)
        
        # 3. Apply business rules
        decision, reasons = self._apply_rules(
            transaction, 
            fraud_probability
        )
        
        # 4. Log for monitoring
        latency = time.time() - start_time
        self._log_decision(transaction, fraud_probability, decision, latency)
        
        return {
            'transaction_id': transaction['transaction_id'],
            'fraud_probability': fraud_probability,
            'decision': decision,
            'reasons': reasons,
            'latency_ms': latency * 1000
        }
    
    def _get_features(self, transaction: dict) -> pd.DataFrame:
        """
        Get or compute features for transaction
        """
        
        customer_id = transaction['customer_id']
        
        # Try cache first
        cached_features = self.redis.get(f"features:{customer_id}")
        
        if cached_features:
            features = pickle.loads(cached_features)
        else:
            # Compute features from historical data
            history = self._fetch_history(customer_id)
            features = self._compute_features(transaction, history)
        
        # Add current transaction features
        features['amount'] = transaction['amount']
        features['hour'] = transaction['timestamp'].hour
        # ... add more features
        
        return features
    
    def _predict_fraud(self, features: pd.DataFrame) -> float:
        """
        Run model prediction
        """
        
        # Ensure features match training
        # (In production, use feature store to ensure consistency)
        
        proba = self.model.predict_proba(features)[0][1]
        return float(proba)
    
    def _apply_rules(self, transaction: dict, probability: float) -> tuple:
        """
        Apply business rules in addition to ML
        """
        
        decision = 'approve'
        reasons = []
        
        # Rule-based overrides
        if probability > 0.9:
            decision = 'decline'
            reasons.append('HIGH_FRAUD_PROBABILITY')
        
        elif probability > 0.5:
            decision = 'review'
            reasons.append('MEDIUM_RISK')
        
        # Velocity checks
        if self._check_velocity(transaction):
            decision = 'review'
            reasons.append('HIGH_VELOCITY')
        
        # Amount threshold
        if transaction['amount'] > 10000:
            decision = 'review'
            reasons.append('HIGH_AMOUNT')
        
        # New customer
        if self._is_new_customer(transaction['customer_id']):
            decision = 'review'
            reasons.append('NEW_CUSTOMER')
        
        return decision, reasons
    
    def _fetch_history(self, customer_id: str) -> list:
        """Fetch transaction history"""
        # Implementation
        pass
    
    def _compute_features(self, transaction: dict, history: list) -> pd.DataFrame:
        """Compute feature vector"""
        # Implementation
        pass
    
    def _check_velocity(self, transaction: dict) -> bool:
        """Check transaction velocity"""
        pass
    
    def _is_new_customer(self, customer_id: str) -> bool:
        """Check if customer is new"""
        pass
    
    def _log_decision(self, transaction: dict, probability: float, 
                     decision: str, latency: float):
        """Log decision for monitoring"""
        # Implementation
        pass

Production Considerations

Model Monitoring

class FraudModelMonitor:
    """
    Monitor fraud detection model in production
    """
    
    def __init__(self):
        self.metrics_store = MetricsStore()
    
    def log_prediction(self, prediction_data: dict):
        """
        Log each prediction for analysis
        """
        
        event = {
            'timestamp': datetime.utcnow(),
            'transaction_id': prediction_data['transaction_id'],
            'model_version': prediction_data['model_version'],
            'fraud_probability': prediction_data['fraud_probability'],
            'decision': prediction_data['decision'],
            'features': prediction_data['features'],
            'latency_ms': prediction_data['latency_ms']
        }
        
        self.metrics_store.log_event('predictions', event)
    
    def calculate_drift(self, window_hours: int = 24):
        """
        Calculate feature and prediction drift
        """
        
        # Compare recent predictions to baseline
        recent = self._get_predictions(window_hours)
        baseline = self._get_predictions(168)  # Past week
        
        # Feature drift (KS test)
        feature_drift = {}
        for feature in recent.features:
            stat, pvalue = ks_2samp(recent[feature], baseline[feature])
            feature_drift[feature] = {'statistic': stat, 'pvalue': pvalue}
        
        # Prediction drift
        pred_stat, pred_pvalue = ks_2samp(
            recent.fraud_probability, 
            baseline.fraud_probability
        )
        
        # Alert if significant drift
        if pred_pvalue < 0.05:
            self._alert(f"Prediction drift detected: p={pred_pvalue:.4f}")
        
        return feature_drift
    
    def calculate_performance(self, window_hours: int = 24):
        """
        Calculate model performance metrics
        """
        
        # Get predictions with actual labels (delayed)
        confirmed = self._get_confirmed_predictions(window_hours)
        
        # Calculate metrics
        from sklearn.metrics import roc_auc_score, precision_recall_curve
        
        auc = roc_auc_score(confirmed.is_fraud, confirmed.fraud_probability)
        
        # Precision at different thresholds
        precision, recall, thresholds = precision_recall_curve(
            confirmed.is_fraud, 
            confirmed.fraud_probability
        )
        
        return {
            'auc_roc': auc,
            'precision_at_50': precision[thresholds >= 0.5][0] if any(thresholds >= 0.5) else 0,
            'recall_at_50': recall[thresholds >= 0.5][0] if any(thresholds >= 0.5) else 0,
            'total_predictions': len(confirmed),
            'confirmed_fraud': confirmed.is_fraud.sum()
        }

Common Pitfalls

1. Not Using Proper Validation Strategy

# Anti-pattern: Random train/test split for time series
def bad_validation():
    # This leaks future information!
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
# Good pattern: Time-based validation
def good_validation():
    # Use time-based split
    train = df[df['timestamp'] < '2024-01-01']
    test = df[(df['timestamp'] >= '2024-01-01') & 
              (df['timestamp'] < '2024-02-01')]
    
    X_train, y_train = train[features], train['is_fraud']
    X_test, y_test = test[features], test['is_fraud']

2. Ignoring Feature Engineering

# Anti-pattern: Using raw features only
def bad_features():
    # Raw transaction amount, timestamp
    features = ['amount', 'timestamp']
    
# Good pattern: Engineer domain-specific features
def good_features():
    # Amount vs historical average
    # Velocity (transactions per hour)
    # Time since last transaction
    # Device risk score
    # Merchant risk score

External Resources

Conclusion

Building effective fraud detection systems requires a combination of:

Rich Feature Engineering: Domain-specific features that capture fraud patterns
Appropriate Models: Algorithms that handle class imbalance and can be deployed at scale
Real-Time Processing: Sub-100ms inference for transaction processing
Continuous Monitoring: Tracking drift and performance in production

The battle against fraud is ongoing - as detection improves, fraudsters adapt. Successful systems combine machine learning with rules-based systems, continuously retrain models, and monitor for emerging patterns.

Key takeaways:

Engineer features that capture behavioral anomalies
Use ensemble methods and handle class imbalance
Build real-time scoring infrastructure with low latency
Monitor for concept drift and model degradation
Combine ML with business rules for robust decisioning