Statistical Analysis with SciPy: A Practical Guide

Statistical analysis is at the heart of data science. Whether you’re testing hypotheses, comparing groups, or understanding distributions, having the right tools makes all the difference. SciPy’s scipy.stats module provides a comprehensive collection of statistical functions that transform raw data into actionable insights.

In this guide, we’ll explore how to leverage SciPy for practical statistical analysis. We’ll move beyond theory to show you exactly how to implement statistical tests, work with probability distributions, and interpret results in real-world scenarios.

Why SciPy for Statistical Analysis?

SciPy is built on top of NumPy and provides:

Comprehensive Statistical Functions: Over 80 probability distributions and statistical tests
Efficient Computation: Optimized C implementations for performance
Integration with Python Ecosystem: Works seamlessly with Pandas, NumPy, and Matplotlib
Production-Ready: Trusted by researchers, data scientists, and engineers worldwide
Well-Documented: Extensive documentation with clear examples

Let’s dive into practical statistical analysis with SciPy.

Getting Started with SciPy Statistics

Installation and Setup

# Install SciPy (if not already installed)
# pip install scipy numpy pandas matplotlib

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

Basic Statistical Measures

Before running complex tests, let’s understand how to calculate basic descriptive statistics:

# Generate sample data
data = np.random.normal(loc=100, scale=15, size=1000)

# Calculate descriptive statistics
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)

# Using SciPy for more detailed statistics
describe_result = stats.describe(data)

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Standard Deviation: {std_dev:.2f}")
print(f"Variance: {variance:.2f}")
print(f"\nSciPy describe():")
print(f"Count: {describe_result.nobs}")
print(f"Min: {describe_result.minmax[0]:.2f}")
print(f"Max: {describe_result.minmax[1]:.2f}")
print(f"Mean: {describe_result.mean:.2f}")
print(f"Variance: {describe_result.variance:.2f}")
print(f"Skewness: {describe_result.skewness:.2f}")
print(f"Kurtosis: {describe_result.kurtosis:.2f}")

The describe() function provides a comprehensive summary including skewness (asymmetry) and kurtosis (tail heaviness), which are important for understanding data distribution shape.

Probability Distributions

Understanding Distributions

Probability distributions are fundamental to statistics. SciPy provides access to over 80 distributions, each with methods for probability density function (PDF), cumulative distribution function (CDF), and random sampling.

Working with the Normal Distribution

# Create a normal distribution object
normal_dist = stats.norm(loc=100, scale=15)

# Generate random samples
samples = normal_dist.rvs(size=1000)

# Calculate probability density at specific points
x_values = np.linspace(70, 130, 100)
pdf_values = normal_dist.pdf(x_values)

# Calculate cumulative probability (P(X <= x))
prob_less_than_110 = normal_dist.cdf(110)
print(f"P(X <= 110) = {prob_less_than_110:.4f}")

# Calculate quantiles (inverse CDF)
quantile_95 = normal_dist.ppf(0.95)
print(f"95th percentile: {quantile_95:.2f}")

# Visualize the distribution
plt.figure(figsize=(10, 6))
plt.plot(x_values, pdf_values, 'b-', linewidth=2, label='PDF')
plt.fill_between(x_values, pdf_values, alpha=0.3)
plt.axvline(110, color='r', linestyle='--', label='X=110')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Normal Distribution (μ=100, σ=15)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Other Common Distributions

# Exponential distribution (waiting times, lifetimes)
exp_dist = stats.expon(scale=2)
exp_samples = exp_dist.rvs(size=1000)

# Binomial distribution (number of successes in n trials)
binom_dist = stats.binom(n=10, p=0.5)
binom_samples = binom_dist.rvs(size=1000)

# Poisson distribution (count of events in fixed interval)
poisson_dist = stats.poisson(mu=3)
poisson_samples = poisson_dist.rvs(size=1000)

# Uniform distribution (equal probability across range)
uniform_dist = stats.uniform(loc=0, scale=10)
uniform_samples = uniform_dist.rvs(size=1000)

# Visualize multiple distributions
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Exponential
axes[0, 0].hist(exp_samples, bins=30, density=True, alpha=0.7)
axes[0, 0].set_title('Exponential Distribution')

# Binomial
axes[0, 1].hist(binom_samples, bins=11, density=True, alpha=0.7)
axes[0, 1].set_title('Binomial Distribution')

# Poisson
axes[1, 0].hist(poisson_samples, bins=15, density=True, alpha=0.7)
axes[1, 0].set_title('Poisson Distribution')

# Uniform
axes[1, 1].hist(uniform_samples, bins=30, density=True, alpha=0.7)
axes[1, 1].set_title('Uniform Distribution')

plt.tight_layout()
plt.show()

Hypothesis Testing

Understanding Hypothesis Testing

Hypothesis testing is a systematic method for making decisions about populations based on sample data. The process involves:

Null Hypothesis (H₀): The default assumption (usually “no effect”)
Alternative Hypothesis (H₁): What we’re trying to prove
Test Statistic: A calculated value from the sample
P-value: Probability of observing the data if H₀ is true
Significance Level (α): Threshold for rejecting H₀ (typically 0.05)

One-Sample T-Test

Test whether a sample mean differs significantly from a known population mean.

# Scenario: A manufacturer claims their product lasts 1000 hours
# We test 30 units and want to verify the claim

sample_data = np.array([
    985, 1010, 995, 1020, 1005, 990, 1015, 1000, 1008, 1002,
    998, 1012, 1003, 997, 1018, 1001, 1009, 994, 1011, 1004,
    992, 1016, 1006, 999, 1013, 1002, 1007, 996, 1014, 1000
])

# Perform one-sample t-test
# H₀: μ = 1000 (the claimed mean)
# H₁: μ ≠ 1000 (the mean is different)
t_statistic, p_value = stats.ttest_1samp(sample_data, popmean=1000)

print("One-Sample T-Test Results:")
print(f"Sample mean: {np.mean(sample_data):.2f}")
print(f"Sample std dev: {np.std(sample_data, ddof=1):.2f}")
print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Significance level: 0.05")

# Interpretation
if p_value < 0.05:
    print("Result: REJECT the null hypothesis")
    print("Conclusion: The sample mean differs significantly from 1000 hours")
else:
    print("Result: FAIL TO REJECT the null hypothesis")
    print("Conclusion: No significant evidence that the mean differs from 1000 hours")

Two-Sample T-Test

Compare means between two independent groups.

# Scenario: Compare test scores between two teaching methods
group_a = np.array([78, 82, 85, 88, 90, 76, 84, 87, 89, 91,
                    79, 83, 86, 88, 92, 77, 85, 87, 90, 93])
group_b = np.array([72, 75, 78, 80, 82, 70, 76, 79, 81, 84,
                    73, 77, 79, 81, 83, 71, 77, 80, 82, 85])

# Perform two-sample t-test
# H₀: μ_A = μ_B (means are equal)
# H₁: μ_A ≠ μ_B (means are different)
t_statistic, p_value = stats.ttest_ind(group_a, group_b)

print("Two-Sample T-Test Results:")
print(f"Group A mean: {np.mean(group_a):.2f}")
print(f"Group B mean: {np.mean(group_b):.2f}")
print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Result: The two groups have significantly different means")
else:
    print("Result: No significant difference between group means")

# Visualize the comparison
plt.figure(figsize=(10, 6))
plt.boxplot([group_a, group_b], labels=['Method A', 'Method B'])
plt.ylabel('Test Score')
plt.title('Comparison of Teaching Methods')
plt.grid(True, alpha=0.3)
plt.show()

Paired T-Test

Compare measurements from the same subjects before and after an intervention.

# Scenario: Blood pressure before and after medication
before = np.array([140, 145, 138, 142, 148, 141, 139, 144, 146, 143,
                   140, 142, 145, 141, 147, 139, 143, 145, 142, 140])
after = np.array([135, 138, 132, 136, 141, 135, 133, 138, 140, 137,
                  134, 136, 139, 135, 141, 133, 137, 139, 136, 134])

# Perform paired t-test
# H₀: μ_difference = 0 (no change)
# H₁: μ_difference ≠ 0 (there is a change)
t_statistic, p_value = stats.ttest_rel(before, after)

differences = before - after

print("Paired T-Test Results:")
print(f"Mean before: {np.mean(before):.2f}")
print(f"Mean after: {np.mean(after):.2f}")
print(f"Mean difference: {np.mean(differences):.2f}")
print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Result: The medication significantly changed blood pressure")
else:
    print("Result: No significant change in blood pressure")

Analysis of Variance (ANOVA)

One-Way ANOVA

Compare means across three or more groups.

# Scenario: Compare sales performance across three regions
region_north = np.array([45, 52, 48, 55, 50, 49, 53, 51, 54, 47])
region_south = np.array([38, 42, 40, 45, 41, 39, 43, 41, 44, 40])
region_west = np.array([50, 58, 52, 60, 55, 51, 59, 53, 61, 54])

# Perform one-way ANOVA
# H₀: μ_north = μ_south = μ_west (all means are equal)
# H₁: At least one mean is different
f_statistic, p_value = stats.f_oneway(region_north, region_south, region_west)

print("One-Way ANOVA Results:")
print(f"North mean: {np.mean(region_north):.2f}")
print(f"South mean: {np.mean(region_south):.2f}")
print(f"West mean: {np.mean(region_west):.2f}")
print(f"F-statistic: {f_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Result: Significant differences exist between regions")
    print("Next step: Perform post-hoc tests to identify which regions differ")
else:
    print("Result: No significant differences between regions")

# Visualize the comparison
plt.figure(figsize=(10, 6))
plt.boxplot([region_north, region_south, region_west],
            labels=['North', 'South', 'West'])
plt.ylabel('Sales')
plt.title('Sales Performance by Region')
plt.grid(True, alpha=0.3)
plt.show()

Correlation and Regression

Pearson Correlation

Measure linear relationship between two continuous variables.

# Scenario: Relationship between advertising spend and sales
advertising = np.array([100, 150, 200, 250, 300, 350, 400, 450, 500, 550])
sales = np.array([500, 650, 750, 900, 1100, 1250, 1400, 1600, 1800, 2000])

# Calculate Pearson correlation
correlation, p_value = stats.pearsonr(advertising, sales)

print("Pearson Correlation Results:")
print(f"Correlation coefficient: {correlation:.4f}")
print(f"P-value: {p_value:.6f}")

# Interpretation
if p_value < 0.05:
    if correlation > 0:
        print("Result: Strong positive correlation (statistically significant)")
    else:
        print("Result: Strong negative correlation (statistically significant)")
else:
    print("Result: No significant correlation")

# Visualize the relationship
plt.figure(figsize=(10, 6))
plt.scatter(advertising, sales, s=100, alpha=0.6)

# Add regression line
z = np.polyfit(advertising, sales, 1)
p = np.poly1d(z)
plt.plot(advertising, p(advertising), "r--", linewidth=2, label='Regression line')

plt.xlabel('Advertising Spend ($)')
plt.ylabel('Sales ($)')
plt.title(f'Advertising vs Sales (r={correlation:.3f}, p={p_value:.4f})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Spearman Rank Correlation

Measure monotonic relationship (useful for non-linear or ranked data).

# Scenario: Relationship between employee experience and performance rating
experience = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
performance = np.array([3, 4, 5, 6, 7, 8, 7, 9, 8, 10])

# Calculate Spearman correlation
correlation, p_value = stats.spearmanr(experience, performance)

print("Spearman Rank Correlation Results:")
print(f"Correlation coefficient: {correlation:.4f}")
print(f"P-value: {p_value:.6f}")

# Compare with Pearson
pearson_corr, pearson_p = stats.pearsonr(experience, performance)
print(f"\nComparison:")
print(f"Pearson correlation: {pearson_corr:.4f}")
print(f"Spearman correlation: {correlation:.4f}")

Chi-Square Test

Chi-Square Goodness of Fit

Test whether observed frequencies match expected frequencies.

# Scenario: A die is rolled 600 times
# Expected: each face appears 100 times
# Observed: actual counts from experiment

observed = np.array([95, 108, 102, 97, 105, 93])  # Counts for faces 1-6
expected = np.array([100, 100, 100, 100, 100, 100])  # Expected counts

# Perform chi-square goodness of fit test
chi2_statistic, p_value = stats.chisquare(observed, expected)

print("Chi-Square Goodness of Fit Test:")
print(f"Chi-square statistic: {chi2_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Result: The die appears to be biased")
else:
    print("Result: The die appears to be fair")

# Visualize
plt.figure(figsize=(10, 6))
x = np.arange(1, 7)
width = 0.35
plt.bar(x - width/2, observed, width, label='Observed', alpha=0.8)
plt.bar(x + width/2, expected, width, label='Expected', alpha=0.8)
plt.xlabel('Die Face')
plt.ylabel('Frequency')
plt.title('Chi-Square Goodness of Fit Test')
plt.xticks(x)
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Chi-Square Test of Independence

Test whether two categorical variables are independent.

# Scenario: Relationship between gender and product preference
# Create contingency table
contingency_table = np.array([
    [50, 30, 20],  # Male: Product A, B, C
    [35, 45, 40]   # Female: Product A, B, C
])

# Perform chi-square test of independence
chi2_statistic, p_value, dof, expected_freq = stats.chi2_contingency(contingency_table)

print("Chi-Square Test of Independence:")
print(f"Chi-square statistic: {chi2_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

if p_value < 0.05:
    print("Result: Gender and product preference are NOT independent")
    print("Conclusion: There is a significant relationship between gender and preference")
else:
    print("Result: Gender and product preference are independent")

print("\nExpected frequencies:")
print(expected_freq)

Practical Workflow: Complete Analysis Example

Let’s combine multiple statistical techniques in a realistic scenario.

# Scenario: Analyze customer satisfaction data across product categories

# Generate sample data
np.random.seed(42)
categories = ['Electronics', 'Clothing', 'Home', 'Sports']
satisfaction_scores = {
    'Electronics': np.random.normal(7.2, 1.5, 100),
    'Clothing': np.random.normal(7.8, 1.2, 100),
    'Home': np.random.normal(7.5, 1.4, 100),
    'Sports': np.random.normal(8.1, 1.1, 100)
}

# 1. Descriptive Statistics
print("=" * 60)
print("DESCRIPTIVE STATISTICS")
print("=" * 60)
for category, scores in satisfaction_scores.items():
    print(f"\n{category}:")
    print(f"  Mean: {np.mean(scores):.2f}")
    print(f"  Median: {np.median(scores):.2f}")
    print(f"  Std Dev: {np.std(scores):.2f}")
    print(f"  Min: {np.min(scores):.2f}, Max: {np.max(scores):.2f}")

# 2. Test for Normality (Shapiro-Wilk test)
print("\n" + "=" * 60)
print("NORMALITY TEST (Shapiro-Wilk)")
print("=" * 60)
for category, scores in satisfaction_scores.items():
    stat, p_value = stats.shapiro(scores)
    is_normal = "Yes" if p_value > 0.05 else "No"
    print(f"{category}: p-value = {p_value:.4f}, Normal: {is_normal}")

# 3. ANOVA to compare means across categories
print("\n" + "=" * 60)
print("ONE-WAY ANOVA")
print("=" * 60)
f_stat, p_value = stats.f_oneway(
    satisfaction_scores['Electronics'],
    satisfaction_scores['Clothing'],
    satisfaction_scores['Home'],
    satisfaction_scores['Sports']
)
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
    print("Result: Significant differences exist between categories")
else:
    print("Result: No significant differences between categories")

# 4. Pairwise comparisons (t-tests with Bonferroni correction)
print("\n" + "=" * 60)
print("PAIRWISE COMPARISONS (with Bonferroni correction)")
print("=" * 60)
categories_list = list(satisfaction_scores.keys())
n_comparisons = len(categories_list) * (len(categories_list) - 1) / 2
bonferroni_alpha = 0.05 / n_comparisons

for i in range(len(categories_list)):
    for j in range(i + 1, len(categories_list)):
        cat1, cat2 = categories_list[i], categories_list[j]
        t_stat, p_val = stats.ttest_ind(
            satisfaction_scores[cat1],
            satisfaction_scores[cat2]
        )
        significant = "Yes" if p_val < bonferroni_alpha else "No"
        print(f"{cat1} vs {cat2}: p-value = {p_val:.4f}, Significant: {significant}")

# 5. Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Box plot
ax = axes[0, 0]
data_list = [satisfaction_scores[cat] for cat in categories]
ax.boxplot(data_list, labels=categories)
ax.set_ylabel('Satisfaction Score')
ax.set_title('Distribution by Category')
ax.grid(True, alpha=0.3, axis='y')

# Violin plot
ax = axes[0, 1]
parts = ax.violinplot(data_list, positions=range(len(categories)), showmeans=True)
ax.set_xticks(range(len(categories)))
ax.set_xticklabels(categories)
ax.set_ylabel('Satisfaction Score')
ax.set_title('Violin Plot by Category')
ax.grid(True, alpha=0.3, axis='y')

# Histogram with overlays
ax = axes[1, 0]
for category, scores in satisfaction_scores.items():
    ax.hist(scores, alpha=0.5, label=category, bins=20)
ax.set_xlabel('Satisfaction Score')
ax.set_ylabel('Frequency')
ax.set_title('Distribution Comparison')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Mean with confidence intervals
ax = axes[1, 1]
means = [np.mean(satisfaction_scores[cat]) for cat in categories]
ci_95 = [1.96 * np.std(satisfaction_scores[cat]) / np.sqrt(len(satisfaction_scores[cat]))
         for cat in categories]
ax.bar(categories, means, yerr=ci_95, capsize=10, alpha=0.7)
ax.set_ylabel('Mean Satisfaction Score')
ax.set_title('Mean with 95% Confidence Intervals')
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Interpreting P-Values and Statistical Significance

Understanding P-Values

The p-value is the probability of observing your data (or more extreme) if the null hypothesis is true. It’s NOT the probability that the null hypothesis is true.

p-value < 0.05: Typically considered statistically significant (reject H₀)
p-value < 0.01: Highly statistically significant
p-value > 0.05: Not statistically significant (fail to reject H₀)

Common Misconceptions

# Example: What a p-value of 0.03 means
print("If p-value = 0.03:")
print("✓ Correct: There's a 3% chance of observing this data if H₀ is true")
print("✗ Wrong: There's a 97% chance the alternative hypothesis is true")
print("✗ Wrong: The result is definitely real and important")
print("\nStatistical significance ≠ Practical significance")
print("A large sample can produce significant p-values for trivial effects")

Effect Size

Always report effect size alongside p-values:

# Example: Cohen's d for effect size
group1 = np.array([10, 12, 14, 16, 18])
group2 = np.array([15, 17, 19, 21, 23])

# Calculate Cohen's d
mean_diff = np.mean(group2) - np.mean(group1)
pooled_std = np.sqrt((np.std(group1)**2 + np.std(group2)**2) / 2)
cohens_d = mean_diff / pooled_std

print(f"Cohen's d: {cohens_d:.3f}")
print("Interpretation:")
print("  |d| < 0.2: Small effect")
print("  0.2 ≤ |d| < 0.5: Small to medium effect")
print("  0.5 ≤ |d| < 0.8: Medium to large effect")
print("  |d| ≥ 0.8: Large effect")

Best Practices for Statistical Analysis

1. Check Assumptions

Different tests have different assumptions. Always verify them:

# Check normality before using parametric tests
data = np.random.normal(100, 15, 100)
stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test p-value: {p_value:.4f}")
print("Data is normally distributed" if p_value > 0.05 else "Data is not normally distributed")

# Check homogeneity of variance (Levene's test)
group1 = np.random.normal(100, 10, 50)
group2 = np.random.normal(100, 20, 50)
stat, p_value = stats.levene(group1, group2)
print(f"Levene's test p-value: {p_value:.4f}")
print("Variances are equal" if p_value > 0.05 else "Variances are not equal")

2. Use Appropriate Tests

Parametric tests (t-test, ANOVA): Assume normality, more powerful
Non-parametric tests (Mann-Whitney U, Kruskal-Wallis): No normality assumption, more robust

# Non-parametric alternative to t-test
group1 = np.array([1, 2, 3, 4, 5])
group2 = np.array([3, 4, 5, 6, 7])

# Mann-Whitney U test (non-parametric)
stat, p_value = stats.mannwhitneyu(group1, group2)
print(f"Mann-Whitney U test p-value: {p_value:.4f}")

3. Control for Multiple Comparisons

When performing multiple tests, adjust significance level:

# Bonferroni correction
n_tests = 10
alpha = 0.05
bonferroni_alpha = alpha / n_tests
print(f"Adjusted significance level: {bonferroni_alpha:.4f}")

4. Report Complete Results

Always include:

Test statistic
P-value
Effect size
Confidence intervals
Sample size

Conclusion

SciPy’s statistical capabilities empower you to move beyond descriptive statistics to rigorous hypothesis testing and inference. Key takeaways:

Start with Exploration: Use descriptive statistics and visualizations to understand your data
Check Assumptions: Verify that your data meets test requirements
Choose Appropriate Tests: Select parametric or non-parametric tests based on your data
Interpret Carefully: P-values indicate statistical significance, not practical importance
Report Completely: Include test statistics, p-values, effect sizes, and confidence intervals
Combine with Domain Knowledge: Statistical results should inform, not replace, expert judgment

Statistical analysis is both an art and a science. With SciPy as your tool and these principles as your guide, you’ll be able to extract meaningful insights from data and make confident, data-driven decisions.

Key SciPy Functions Reference

Function	Purpose
`stats.describe()`	Comprehensive descriptive statistics
`stats.ttest_1samp()`	One-sample t-test
`stats.ttest_ind()`	Two-sample t-test
`stats.ttest_rel()`	Paired t-test
`stats.f_oneway()`	One-way ANOVA
`stats.pearsonr()`	Pearson correlation
`stats.spearmanr()`	Spearman rank correlation
`stats.chisquare()`	Chi-square goodness of fit
`stats.chi2_contingency()`	Chi-square test of independence
`stats.shapiro()`	Shapiro-Wilk normality test
`stats.levene()`	Levene’s homogeneity of variance test
`stats.mannwhitneyu()`	Mann-Whitney U test (non-parametric)

Start exploring your data with SciPy today, and unlock the power of statistical analysis in Python!