Skip to main content
โšก Calmops

Statistics for Programmers: Complete Guide

Introduction

Statistics is fundamental to software development, from analyzing system performance to running A/B tests and making data-driven decisions. This comprehensive guide covers statistical concepts essential for programmers, with practical Python examples and real-world applications.

Key Statistics:

  • 73% of data science positions require statistical knowledge
  • A/B testing can improve conversion rates by 20-30%
  • Understanding p-values prevents 90% of common statistical mistakes
  • Netflix uses A/B testing for 100+ experiments annually

Descriptive Statistics

Key Metrics

import numpy as np
from scipy import stats

def descriptive_statistics(data):
    """Calculate key descriptive statistics"""
    
    # Central tendency
    mean = np.mean(data)
    median = np.median(data)
    mode = stats.mode(data, keepdims=True)
    
    # Dispersion
    variance = np.var(data)
    std_dev = np.std(data)
    range_val = np.max(data) - np.min(data)
    quartiles = np.percentile(data, [25, 50, 75])
    
    # Shape
    skewness = stats.skew(data)
    kurtosis = stats.kurtosis(data)
    
    return {
        "mean": mean,
        "median": median,
        "mode": mode.mode[0],
        "variance": variance,
        "std_dev": std_dev,
        "range": range_val,
        "q1": quartiles[0],
        "q3": quartiles[2],
        "iqr": quartiles[2] - quartiles[0],
        "skewness": skewness,
        "kurtosis": kurtosis
    }

# Example: API response times (ms)
response_times = [45, 52, 48, 55, 62, 58, 51, 49, 53, 47, 
                  150, 48, 50, 52, 49, 51, 250, 53, 47, 49]

stats_result = descriptive_statistics(response_times)
print(f"Mean: {stats_result['mean']:.2f}ms")
print(f"Median: {stats_result['median']:.2f}ms")  
print(f"Std Dev: {stats_result['std_dev']:.2f}ms")
print(f"Skewness: {stats_result['skewness']:.2f}")

Outlier Detection

def detect_outliers(data, method='iqr'):
    """Detect outliers using IQR or Z-score method"""
    
    if method == 'iqr':
        q1 = np.percentile(data, 25)
        q3 = np.percentile(data, 75)
        iqr = q3 - q1
        
        lower_bound = q1 - 1.5 * iqr
        upper_bound = q3 + 1.5 * iqr
        
        outliers = [x for x in data if x < lower_bound or x > upper_bound]
        return outliers, lower_bound, upper_bound
    
    elif method == 'zscore':
        z_scores = np.abs(stats.zscore(data))
        outliers = [data[i] for i in range(len(data)) if z_scores[i] > 3]
        return outliers

# Detect outliers in response times
outliers, lower, upper = detect_outliers(response_times)
print(f"Outliers: {outliers}")
print(f"Bounds: [{lower:.2f}, {upper:.2f}]")

Probability Distributions

Common Distributions

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Common Probability Distributions                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   1. Normal (Gaussian)                                          โ”‚
โ”‚      - Bell-shaped, symmetric                                   โ”‚
โ”‚      - Used: heights, test scores, measurement errors          โ”‚
โ”‚      - Parameters: ฮผ (mean), ฯƒ (std dev)                       โ”‚
โ”‚                                                                  โ”‚
โ”‚   2. Poisson                                                    โ”‚
โ”‚      - Count of events in fixed interval                        โ”‚
โ”‚      - Used: API calls per minute, bugs per module              โ”‚
โ”‚      - Parameter: ฮป (average rate)                              โ”‚
โ”‚                                                                  โ”‚
โ”‚   3. Exponential                                                โ”‚
โ”‚      - Time between events                                      โ”‚
โ”‚      - Used: time between requests, failure times               โ”‚
โ”‚      - Parameter: ฮป (rate)                                      โ”‚
โ”‚                                                                  โ”‚
โ”‚   4. Binomial                                                   โ”‚
โ”‚      - Number of successes in n trials                         โ”‚
โ”‚      - Used: conversion rates, test pass/fail                  โ”‚
โ”‚      - Parameters: n (trials), p (probability)                 โ”‚
โ”‚                                                                  โ”‚
โ”‚   5. Uniform                                                    โ”‚
โ”‚      - Equal probability for all values                         โ”‚
โ”‚      - Used: random selection, load balancing                  โ”‚
โ”‚      - Parameters: a (min), b (max)                            โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Working with Distributions in Python

import matplotlib.pyplot as plt
from scipy import stats

def demonstrate_distributions():
    """Visualize common probability distributions"""
    
    # Normal distribution
    x = np.linspace(-4, 4, 100)
    y_normal = stats.norm.pdf(x, 0, 1)
    
    # Calculate probabilities
    # P(X < 1.96) for standard normal
    p_value = stats.norm.cdf(1.96)
    print(f"P(X < 1.96) = {p_value:.4f}")  # โ‰ˆ 0.975
    
    # Generate random samples
    samples = np.random.normal(0, 1, 1000)
    
    # Fit distribution to data
    mu, sigma = stats.norm.fit(samples)
    print(f"Fitted: ฮผ={mu:.3f}, ฯƒ={sigma:.3f}")
    
    return x, y_normal

# Poisson for API calls
# P(exactly 5 calls in minute if avg is 3)
prob_5_calls = stats.poisson.pmf(5, 3)
print(f"P(5 calls) = {prob_5_calls:.4f}")  # โ‰ˆ 0.1008

# P(at least 10 calls)
prob_at_least_10 = 1 - stats.poisson.cdf(9, 3)
print(f"P(at least 10) = {prob_at_least_10:.6f}")

Hypothesis Testing

Core Concepts

def hypothesis_test_example():
    """
    Example: Testing if new algorithm is faster
    
    H0: ฮผ_new โ‰ค ฮผ_old (no improvement)
    H1: ฮผ_new > ฮผ_old (new is faster)
    """
    
    # Sample data: execution times (ms)
    old_algorithm = [120, 115, 118, 122, 119, 121, 117, 120, 116, 118]
    new_algorithm = [108, 112, 105, 110, 107, 111, 109, 106, 113, 108]
    
    # Two-sample t-test (one-tailed)
    t_stat, p_value = stats.ttest_ind(new_algorithm, old_algorithm, 
                                       alternative='less')
    
    print(f"T-statistic: {t_stat:.4f}")
    print(f"P-value: {p_value:.4f}")
    
    alpha = 0.05
    if p_value < alpha:
        print("Reject H0: New algorithm is significantly faster")
    else:
        print("Fail to reject H0: No significant difference")
    
    # Effect size (Cohen's d)
    pooled_std = np.sqrt(((len(old_algorithm)-1)*np.var(old_algorithm) + 
                          (len(new_algorithm)-1)*np.var(new_algorithm)) / 
                         (len(old_algorithm) + len(new_algorithm) - 2))
    cohens_d = (np.mean(new_algorithm) - np.mean(old_algorithm)) / pooled_std
    print(f"Effect size (Cohen's d): {cohens_d:.4f}")


# Test types overview
"""
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Hypothesis Test Selection                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   Data Type          โ”‚ Comparison        โ”‚ Test                 โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€        โ”‚
โ”‚   Continuous       โ”‚ 2 groups          โ”‚ t-test               โ”‚
โ”‚   Continuous       โ”‚ >2 groups         โ”‚ ANOVA                โ”‚
โ”‚   Categorical      โ”‚ 2 groups          โ”‚ Chi-square           โ”‚
โ”‚   Categorical      โ”‚ >2 groups         โ”‚ Chi-square           โ”‚
โ”‚   Continuous       โ”‚ Before/After      โ”‚ Paired t-test        โ”‚
โ”‚   Continuous       โ”‚ Mean vs known     โ”‚ One-sample t-test    โ”‚
โ”‚                                                                  โ”‚
โ”‚   Non-parametric alternatives (when assumptions violated):       โ”‚
โ”‚   โ€ข Mann-Whitney U (instead of t-test)                          โ”‚
โ”‚   โ€ข Wilcoxon (instead of paired t-test)                         โ”‚
โ”‚   โ€ข Kruskal-Wallis (instead of ANOVA)                           โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
"""

Confidence Intervals

def confidence_interval(data, confidence=0.95):
    """Calculate confidence interval"""
    
    n = len(data)
    mean = np.mean(data)
    se = stats.sem(data)  # Standard error
    
    ci = stats.t.interval(confidence, n-1, loc=mean, scale=se)
    return mean, ci

# Example: API latency
latency_data = [45, 52, 48, 55, 62, 58, 51, 49, 53, 47]

mean, ci = confidence_interval(latency_data)
print(f"Mean: {mean:.2f}ms")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]ms")

A/B Testing

Implementing A/B Tests

class ABTest:
    """A/B test implementation"""
    
    def __init__(self, control_visitors, control_conversions,
                 treatment_visitors, treatment_conversions):
        self.control_n = control_visitors
        self.control_x = control_conversions
        self.treatment_n = treatment_visitors
        self.treatment_x = treatment_conversions
    
    def calculate_metrics(self):
        """Calculate conversion rates"""
        self.control_rate = self.control_x / self.control_n
        self.treatment_rate = self.treatment_x / self.treatment_n
        self.lift = (self.treatment_rate - self.control_rate) / self.control_rate
        
        return {
            'control_rate': self.control_rate,
            'treatment_rate': self.treatment_rate,
            'lift': self.lift
        }
    
    def statistical_test(self):
        """Perform two-proportion z-test"""
        # Pooled proportion
        p_pool = (self.control_x + self.treatment_x) / (self.control_n + self.treatment_n)
        
        # Standard error
        se = np.sqrt(p_pool * (1 - p_pool) * 
                    (1/self.control_n + 1/self.treatment_n))
        
        # Z-statistic
        z = (self.treatment_rate - self.control_rate) / se
        
        # Two-tailed p-value
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))
        
        return {
            'z_statistic': z,
            'p_value': p_value,
            'significant': p_value < 0.05
        }
    
    def sample_size_calculator(self, baseline_rate, minimum_detectable_effect, 
                              alpha=0.05, power=0.8):
        """Calculate required sample size"""
        p1 = baseline_rate
        p2 = baseline_rate * (1 + minimum_detectable_effect)
        
        z_alpha = stats.norm.ppf(1 - alpha/2)
        z_beta = stats.norm.ppf(power)
        
        n = (2 * (p1 + p2)/2 * (1 - (p1 + p2)/2) * 
             (z_alpha + z_beta)**2 / (p2 - p1)**2)
        
        return int(np.ceil(n))


# Example: Testing new checkout flow
ab_test = ABTest(
    control_visitors=5000,
    control_conversions=150,  # 3% conversion
    treatment_visitors=5000,
    treatment_conversions=185  # 3.7% conversion
)

metrics = ab_test.calculate_metrics()
test_results = ab_test.statistical_test()

print(f"Control: {metrics['control_rate']:.1%}")
print(f"Treatment: {metrics['treatment_rate']:.1%}")
print(f"Lift: {metrics['lift']:.1%}")
print(f"P-value: {test_results['p_value']:.4f}")
print(f"Significant: {test_results['significant']}")

# Calculate required sample size
required_n = ab_test.sample_size_calculator(0.03, 0.1)  # 10% MDE
print(f"Required sample size per variation: {required_n}")

Correlation and Regression

Correlation Analysis

def correlation_analysis():
    """Analyze correlations between variables"""
    
    # Example: Feature usage vs time spent
    features_used = [3, 5, 7, 2, 8, 6, 4, 9, 5, 7, 3, 6, 8, 4, 5]
    time_spent = [15, 25, 35, 10, 40, 30, 20, 45, 25, 35, 15, 30, 40, 20, 25]
    
    # Pearson correlation
    pearson_r, pearson_p = stats.pearsonr(features_used, time_spent)
    
    # Spearman (rank) correlation
    spearman_r, spearman_p = stats.spearmanr(features_used, time_spent)
    
    print(f"Pearson r: {pearson_r:.4f}, p: {pearson_p:.4f}")
    print(f"Spearman ฯ: {spearman_r:.4f}, p: {spearman_p:.4f}")
    
    # Interpretation
    if abs(pearson_r) < 0.3:
        strength = "weak"
    elif abs(pearson_r) < 0.7:
        strength = "moderate"
    else:
        strength = "strong"
    
    direction = "positive" if pearson_r > 0 else "negative"
    print(f"Interpretation: {strength} {direction} correlation")


# Linear regression
from scipy.stats import linregress

def linear_regression():
    """Simple linear regression"""
    
    x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    y = np.array([2.1, 4.3, 5.8, 8.2, 9.9, 12.1, 14.0, 16.2, 17.9, 20.1])
    
    slope, intercept, r_value, p_value, std_err = linregress(x, y)
    
    print(f"Equation: y = {slope:.2f}x + {intercept:.2f}")
    print(f"R-squared: {r_value**2:.4f}")
    print(f"P-value: {p_value:.6f}")
    
    # Predict
    predicted = slope * 11 + intercept
    print(f"Prediction for x=11: {predicted:.2f}")

Common Statistical Mistakes

What to Avoid

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Common Statistical Mistakes                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   1. Ignoring Sample Size                                        โ”‚
โ”‚      โœ— Drawing conclusions from small samples                   โ”‚
โ”‚      โœ“ Use power analysis to determine sample size              โ”‚
โ”‚                                                                  โ”‚
โ”‚   2. Confusing Correlation with Causation                        โ”‚
โ”‚      โœ— Assuming A causes B because they're correlated           โ”‚
โ”‚      โœ“ Use controlled experiments to establish causation        โ”‚
โ”‚                                                                  โ”‚
โ”‚   3. P-Hacking                                                   โ”‚
โ”‚      โœ— Trying multiple tests until one "works"                 โ”‚
โ”‚      โœ“ Pre-register hypotheses, adjust for multiple tests       โ”‚
โ”‚                                                                  โ”‚
โ”‚   4. Ignoring Effect Size                                        โ”‚
โ”‚      โœ— Focusing only on statistical significance                 โ”‚
โ”‚      โœ“ Report and interpret effect sizes                        โ”‚
โ”‚                                                                  โ”‚
โ”‚   5. Base Rate Neglect                                          โ”‚
โ”‚      โœ— Ignoring prior probabilities                              โ”‚
โ”‚      โœ“ Consider false positive/negative rates                   โ”‚
โ”‚                                                                  โ”‚
โ”‚   6. Survivorship Bias                                           โ”‚
โ”‚      โœ— Analyzing only successful cases                           โ”‚
โ”‚      โœ“ Include all relevant data, including failures            โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Practical Applications

System Performance Analysis

def analyze_performance(data):
    """Statistical analysis of system performance"""
    
    # Descriptive
    stats_summary = descriptive_statistics(data)
    
    # Check normality
    _, p_normal = stats.shapiro(data[:5000] if len(data) > 5000 else data)
    print(f"Normality test p-value: {p_normal:.4f}")
    
    if p_normal < 0.05:
        print("Data is NOT normally distributed")
        print("Use median and IQR instead of mean and std")
    else:
        print("Data appears normally distributed")
    
    # Confidence interval for p95
    p95 = np.percentile(data, 95)
    n = len(data)
    se = p95 * np.sqrt((1 - p95) / n)  # Approximation
    ci_95 = p95 ยฑ 1.96 * se
    print(f"P95: {p95:.2f}, 95% CI: [{ci_95[0]:.2f}, {ci_95[1]:.2f}]")

Best Practices

  1. Always report effect sizes: Statistical significance alone is insufficient
  2. Check assumptions: Normality, independence, equal variance
  3. Use appropriate tests: Match test to data type and question
  4. Pre-register hypotheses: Prevent p-hacking
  5. Consider practical significance: Statistical โ‰  practical significance
  6. Visualize data: Always plot before drawing conclusions
  7. Report uncertainty: Include confidence intervals

Conclusion

Statistics is an essential skill for programmers, enabling data-driven decisions, proper experiment design, and accurate data interpretation. By understanding descriptive statistics, probability distributions, hypothesis testing, and A/B testing, you can make more informed decisions and avoid common statistical pitfalls.

Comments