Mutation Testing: Verify Your Tests Actually Work

Introduction

You can have 100% code coverage but still have bad tests. Mutation testing verifies test quality by introducing small changes (mutations) to your code and checking if tests catch them. If tests pass with the mutated code, the mutation survives—meaning your tests are not effectively verifying behavior.

This guide covers mutation testing with Stryker (JavaScript/TypeScript) and PIT (Java), including advanced configuration, CI integration, cost optimization strategies, and how to interpret mutation scores to improve test quality.

How Mutation Testing Works

┌─────────────────────────────────────────────────────────────────┐
│                  Mutation Testing Process                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  Original Code:                                                  │
│  ┌───────────────────────────────────────────┐                  │
│  │  function add(a, b) {                     │                  │
│  │    return a + b;                          │                  │
│  │  }                                        │                  │
│  └───────────────────────────────────────────┘                  │
│                       │                                          │
│                       ▼                                          │
│  Mutation: Change + to -                                        │
│  ┌───────────────────────────────────────────┐                  │
│  │  function add(a, b) {                     │                  │
│  │    return a - b;    ← MUTANT               │                  │
│  │  }                                        │                  │
│  └───────────────────────────────────────────┘                  │
│                       │                                          │
│                       ▼                                          │
│  Run Tests:                                                      │
│  ┌───────────────────────────────────────────┐                  │
│  │  Expect: add(2, 2) = 4                    │                  │
│  │                                           │                  │
│  │  Mutant: 2 - 2 = 0 ≠ 4                   │                  │
│  │  Result: Test FAILED → Mutant KILLED ✓   │                  │
│  └───────────────────────────────────────────┘                  │
│                                                                   │
│  Mutation Score = Killed Mutants / Total Mutants × 100           │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Key Terminology

Term	Definition	Example
Mutant	A modified version of source code	`a - b` instead of `a + b`
Killed	Test detects the mutation (test fails)	Expected 4, got 0
Survived	Test passes despite mutation	Both 2+2 and 2-2 pass
Equivalent mutant	Behaviorally identical despite change	`if (x > 0)` vs `if (x >= 1)`
Mutation score	% of mutants killed	85% = good coverage

JavaScript/TypeScript: Stryker

Setup

npm install -D @stryker-mutator/core @stryker-mutator/typescript-checker
npx stryker init

Configuration

// stryker.conf.json
{
  "$schema": "./node_modules/@stryker-mutator/core/schema/stryker-schema.json",
  "mutator": "typescript",
  "packageManager": "npm",
  "reporters": ["html", "clear-text", "dashboard", "progress"],
  "buildCommand": "npm run build",
  "testRunner": "vitest",
  "coverageAnalysis": "perTest",
  "concurrency": 4,
  "timeoutMS": 5000,
  "mutate": [
    "src/**/*.ts",
    "!src/**/*.spec.ts",
    "!src/**/*.test.ts",
    "!src/types/**"
  ],
  "thresholds": {
    "high": 80,
    "low": 70,
    "break": 60
  },
  "ignoredMutators": [
    "StringLiteral"
  ],
  "checkers": ["typescript"],
  "tsconfigFile": "tsconfig.json"
}

Running

# Run mutation testing
npx stryker run

# Output:
# [Mutation test] Finished in 45 seconds
# Mutant killed: 45/50 (90%)
# ┌───────────────────────────────────────────────┐
# │ File          │ Mutation score                │
# │ ──────────────│─────────────────────────────── │
# │ math.ts       │ 95%   ████████████████████    │
# │ string.ts     │ 85%   █████████████████░░░    │
# │ utils.ts      │ 75%   ███████████████░░░░░ ⚠️ │
# │ payments.ts   │ 60%   ████████████░░░░░░░ 🔴 │
# └───────────────────────────────────────────────┘

Advanced Stryker Configuration

{
  "mutate": ["src/**/*.ts", "!src/**/*.spec.ts"],
  "testRunner": "vitest",
  "vitest": {
    "configFile": "vitest.config.ts"
  },
  "coverageAnalysis": "perTest",
  "concurrency": 8,
  "timeoutMS": 10000,
  "inPlace": false,
  "cleanTempDir": true,

  "htmlReporter": {
    "baseDir": "reports/mutation/html"
  },
  "dashboardReporter": {
    "project": "github.com/myorg/myrepo",
    "version": "main",
    "module": "core",
    "baseUrl": "https://dashboard.stryker-mutator.io"
  },

  "incremental": true,
  "incrementalFile": "reports/mutation/stryker-incremental.json",

  "plugins": [
    "@stryker-mutator/jest-runner",
    "@stryker-mutator/typescript-checker"
  ],

  "tempDirName": ".stryker-tmp",
  "maxTestRunnerReuse": 8
}

Test Runner Configuration

// vitest.config.ts — Stryker-compatible Vitest config
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    include: ['src/**/*.spec.ts'],
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      include: ['src/**/*.ts'],
      exclude: ['src/**/*.spec.ts'],
    },
    testTimeout: 10000,
    hookTimeout: 10000,
    pool: 'forks',
    poolOptions: {
      forks: {
        singleFork: true, // Required for Stryker compatibility
      },
    },
  },
});

Java: PIT

Setup with Maven

<!-- pom.xml -->
<build>
  <plugins>
    <plugin>
      <groupId>org.pitest</groupId>
      <artifactId>pitest-maven</artifactId>
      <version>1.16.0</version>
      <configuration>
        <targetClasses>
          <param>com.example.*</param>
        </targetClasses>
        <targetTests>
          <param>com.example.*</param>
        </targetTests>
        <mutators>
          <mutator>ALL</mutator>
        </mutators>
        <threads>4</threads>
        <timeoutConstant>5000</timeoutConstant>
        <outputFormats>
          <param>HTML</param>
          <param>XML</param>
          <param>CSV</param>
        </outputFormats>
        <mutationThreshold>80</mutationThreshold>
        <coverageThreshold>85</coverageThreshold>
      </configuration>
    </plugin>
  </plugins>
</build>

Setup with Gradle

// build.gradle
plugins {
    id 'info.solidsoft.pitest' version '1.15.0'
}

pitest {
    targetClasses = ['com.example.*']
    targetTests = ['com.example.*']
    mutators = ['ALL']
    threads = 4
    outputFormats = ['HTML', 'XML']
    mutationThreshold = 80
    coverageThreshold = 85
    timestampedReports = false
}

Running

# Maven
mvn org.pitest:pitest-maven:mutationCoverage

# Gradle
gradle pitest

# Results:
# ================================================================================
# >> mutation coverage report
# ================================================================================
#
# Classes : 85% (34/40)
# Methods : 80% (120/150)
# Mutants : 75% (150/200)
#
# >> UNCOVERED MUTATIONS:
# ================================================================================
# com.example.Utils.java:
#   Line 45: replaced + with -                           SURVIVED
#   Line 67: removed conditional - omitted <            SURVIVED
# com.example.PaymentService.java:
#   Line 123: replaced boolean return with false         SURVIVED

Mutation Operators

Operator Reference

arithmetic:
  - "+ → -"
  - "- → +"
  - "* → /"
  - "/ → *"
  - "% → *"

comparison:
  - "< → <="
  - "<= → <"
  - "> → >="
  - ">= → >"
  - "== → !="
  - "!= → =="

boolean:
  - "true → false"
  - "false → true"
  - "&& → ||"
  - "|| → &&"
  - "! → (removed)"

conditional:
  - "Remove if body"
  - "Remove else body"
  - "Negate condition"

string:
  - "Empty string → non-empty"
  - "Non-empty → empty"
  - ".length() → .length() + 1"

number:
  - "x → x + 1"
  - "x → 0"
  - "x → Integer.MAX_VALUE"
  - "x → -x"

void_method:
  - "Remove method call"
  - "Remove return value"

null:
  - "Return null instead of value"
  - "Non-null → null assertion"

collection:
  - "Return empty collection"
  - "Remove element from collection"

Code Examples of Mutations

// Original code
function isAdult(age: number): boolean {
  return age >= 18;
}

// Mutation 1: compare change
function isAdult(age: number): boolean {
  return age > 18;  // < changed to <=
}

// Mutation 2: boolean return flipped
function isAdult(age: number): boolean {
  return !(age >= 18);  // Negated return
}

// Mutation 3: constant change
function isAdult(age: number): boolean {
  return age >= 0;  // 18 changed to 0
}

What Gets Mutated

Operator	Code Before	Code After	Tests Must
Arithmetic	`total = price + tax`	`total = price - tax`	Assert correct total
Comparison	`if (age >= 18)`	`if (age > 18)`	Test both exact boundary
Boolean	`return isValid`	`return !isValid`	Test both true/false paths
Conditional	`if (x) { doA() }`	`if (!x) { doA() }`	Test both branches
String	`name.length()`	`name.length() + 1`	Assert exact length
Number	`return 100`	`return 0`	Assert exact return value
Null	`return user`	`return null`	Handle null case
Collection	`items.add(item)`	`(removed)`	Assert item is added

Interpreting Results

Mutation Score Guide

Score	Rating	Meaning	Action
90-100%	Excellent	Tests catch almost all code changes	Maintain
80-89%	Good	Most bugs caught, minor gaps	Review surviving mutants
70-79%	Warning	Notable gaps in test coverage	Add tests for survivors
60-69%	Poor	Many bugs would slip through	Major test improvement needed
< 60%	Critical	Tests provide little value	Rewrite test suite

Surviving Mutant Analysis

// Example: surviving mutant analysis

// Source: discount.ts
export function calculateDiscount(price: number, coupon: string): number {
  if (coupon === 'SAVE10') {
    return price * 0.1;  // Mutant: 0.1 → 0.2 survives!
  }
  return 0;
}

// Test that lets the mutant survive
describe('calculateDiscount', () => {
  it('returns 10% discount for SAVE10', () => {
    // This test passes even if discount is 20%
    const result = calculateDiscount(100, 'SAVE10');
    expect(result).toBeGreaterThan(0);  // Too vague!
  });
});

// Better test that kills the mutant
describe('calculateDiscount fixed', () => {
  it('returns exactly 10% for SAVE10 coupon', () => {
    const result = calculateDiscount(100, 'SAVE10');
    expect(result).toBe(10);  // Exact assertion kills the mutant
  });

  it('returns 0 for invalid coupon', () => {
    const result = calculateDiscount(100, 'INVALID');
    expect(result).toBe(0);
  });
});

Equivalent Mutants

Some mutations produce behaviorally equivalent code. These are false positives in mutation testing.

// Equivalent mutant example

// Original
function canAccess(role: string): boolean {
  return role === 'admin' || role === 'superadmin';
}

// Mutant: || → &&
function canAccess(role: string): boolean {
  return role === 'admin' && role === 'superadmin'; // Equivalent?
  // No! This is NOT equivalent — changes behavior
}

// Actual equivalent mutant:
if (x > 0) { ... }  vs  if (x >= 1) { ... }
// These are equivalent for integers (but not floats!)

CI Integration

GitHub Actions

# .github/workflows/mutation-testing.yml
name: Mutation Testing

on:
  pull_request:
    paths:
      - 'src/**'
  schedule:
    - cron: '0 6 * * 1' # Weekly on Monday

jobs:
  mutation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22

      - run: npm ci

      - name: Run mutation tests
        run: npx stryker run
        env:
          STRYKER_DASHBOARD_API_KEY: ${{ secrets.STRYKER_DASHBOARD_API_KEY }}

      - name: Upload HTML report
        uses: actions/upload-artifact@v4
        with:
          name: mutation-report
          path: reports/mutation/

      - name: Check mutation score
        run: |
          SCORE=$(cat reports/mutation/score.json | jq -r '.mutationScore')
          if (( $(echo "$SCORE < 70" | bc -l) )); then
            echo "❌ Mutation score $SCORE is below threshold of 70%"
            exit 1
          fi
          echo "✅ Mutation score $SCORE passes threshold"

GitLab CI

# .gitlab-ci.yml
mutation-testing:
  stage: test
  image: node:22
  script:
    - npm ci
    - npx stryker run
  artifacts:
    paths:
      - reports/mutation/
    reports:
      junit: reports/mutation/junit.xml
  only:
    - merge_requests
  variables:
    STRYKER_DASHBOARD_API_KEY: $STRYKER_DASHBOARD_API_KEY

Performance Optimization

Mutation testing is computationally expensive. Optimize for speed:

{
  "incremental": true,
  "incrementalFile": "reports/mutation/stryker-incremental.json",
  "concurrency": 8,
  "coverageAnalysis": "perTest",
  "maxTestRunnerReuse": 8,

  // Only mutate changed files when possible
  "incremental": true,
  "ignoreStatic": true
}

# Run on changed files only (incremental mode)
npx stryker run --incremental

# Run on specific files for fast feedback
npx stryker run --mutate src/math.ts

# Limit to specific test files
npx stryker run --testFilter "math*"

Mutation Testing at Scale

Selective Mutation

Running all mutators on large codebases is expensive. Use selective mutation to focus on high-value operators.

{
  "mutators": [
    "ArithmeticOperator",
    "EqualityOperator",
    "ConditionalExpression",
    "BooleanSubstitution",
    "VoidMethodCall",
    "ReturnValue"
  ],
  "ignoredMutators": [
    "StringLiteral",
    "NumberLiteral",
    "ObjectLiteral"
  ]
}

Cost-Benefit Analysis

Strategy	Runtime Savings	Coverage Impact	Use Case
Full mutation	0% (baseline)	100%	Release gate, small projects
Selective mutators	40-60%	85-95%	Regular CI
Incremental	70-90%	95%	Per-PR testing
File-limited	90-95%	100% (changed files)	Pre-commit hooks
Timed mode	50-80%	Varies	Large codebases

Time Budget Configuration

{
  "timeoutMS": 5000,
  "timeoutFactor": 1.5,
  "maxTestRunnerReuse": 8,

  // Stop after time budget exceeded
  "maxMutationScore": null,
  "maxMutants": null,

  // Only run mutation on critical modules
  "mutate": [
    "src/payments/**",
    "src/auth/**",
    "src/billing/**"
  ]
}

Language-Specific Tools

Language	Tool	Installation	Integrations
JavaScript/TypeScript	Stryker	npm i -D @stryker-mutator/core	Jest, Vitest, Mocha, Jasmine
Java	PIT	Maven/Gradle plugin	JUnit, TestNG, Mockito
Python	mutmut	pip install mutmut	pytest, unittest
Python	Cosmic Ray	pip install cosmic-ray	pytest, unittest
Go	go-mutesting	go install go-mutesting	go test
Rust	mutagen	cargo install mutagen	cargo test
Ruby	mutant	gem install mutant	RSpec, Minitest
Kotlin	Pitest	Gradle pitest plugin	JUnit, KotlinTest
C#	Stryker.NET	dotnet tool install	xUnit, NUnit, MSTest

Python: mutmut Example

pip install mutmut

# Run mutation testing
mutmut run --paths-to-mutate src/

# See results
mutmut results

# Show surviving mutants
mutmut show 1  # Show mutant #1 source diff

# Example mutmut output
# ----------
# Legend for mutated file: src/calculator.py
# ⚡ 1: def add(a, b):
# ⚡ 2:     return a - b  # ← mutant here
# ⚡ 3:
# ⚡ 4: def multiply(a, b):
# ⚡ 5:     return a / b  # ← mutant here
#
# Killed: 12/15 (80%)
# Survived: 3/15 (20%)
# Timeout: 0/15 (0%)

Go: go-mutesting Example

go install github.com/gregory91/go-mutesting/...

# Run mutation testing
go-mutesting ./...

# Run on specific package
go-mutesting ./src/math/

# Output:
# PASS    "src/math/divide.go"   "/" -> "*"
# FAIL    "src/math/multiply.go" "*" -> "/"
# Mutation score: 8/10 (80%)

Best Practices

1. Set Realistic Thresholds

Start with a 60% mutation score target for existing projects. Increase to 80%+ for new code. Use thresholds.break to fail CI below the minimum.

2. Run Incrementally

Run full mutation testing nightly or weekly. Use incremental mode in CI to only test changed files on PRs.

3. Focus on Business Logic

Prioritize mutation testing for core business logic, payment processing, and security-critical code. Infrastructure code and simple getters/setters provide less value.

4. Review Surviving Mutants

Not all surviving mutants indicate bad tests. Some may be equivalent mutants. Review each survivor and either add a test or document why it’s acceptable.

5. Combine with Coverage

Use mutation testing alongside code coverage. High coverage with low mutation score means tests exist but don’t verify behavior.

Scenario	Coverage	Mutation Score	Meaning
Good tests	90%	90%	Tests verify behavior thoroughly
False confidence	90%	40%	Tests exist but don’t verify correctly
Missing tests	40%	60%	Some code untested
Over-specified	95%	95%	Good coverage, good verification
Integration-heavy	70%	85%	Integration tests catch more per test

6. Integrate into Code Review

# PR checklist with mutation testing
pr_checklist:
  - "Mutation score ≥ 70% for modified files"
  - "No surviving mutants in critical business logic"
  - "All surviving mutants reviewed and documented"
  - "Coverage increased or maintained"

Common Pitfalls

1. Testing Only Happy Path

// ❌ Tests that miss error handling
describe('processPayment', () => {
  it('processes valid payment', () => {
    expect(processPayment(validCard)).toBe(true);
  });
  // Missing: failure cases, edge cases, timeouts
});

// ✅ Comprehensive tests
describe('processPayment', () => {
  it('processes valid payment', () => {
    expect(processPayment(validCard)).toBe(true);
  });

  it('rejects expired card', () => {
    expect(() => processPayment(expiredCard)).toThrow('Card expired');
  });

  it('rejects invalid CVV', () => {
    expect(() => processPayment(badCvv)).toThrow('Invalid CVV');
  });

  it('handles gateway timeout', () => {
    expect(() => processPayment(timeoutCard)).toThrow('Gateway timeout');
  });
});

2. Asserting Too Generically

// ❌ Weak assertions let mutants survive
it('returns users', () => {
  const users = getUsers();
  expect(users).toBeDefined();
  expect(users.length).toBeGreaterThan(0);
});

// ✅ Strong assertions kill mutants
it('returns active users sorted by name', () => {
  const users = getUsers();
  expect(users).toHaveLength(3);
  expect(users[0].name).toBe('Alice');
  expect(users[0].status).toBe('active');
  expect(users[1].name).toBe('Bob');
});

Resources

Stryker Mutator Documentation — Official Stryker guide
PIT Mutation Testing — Java mutation testing tool
Mutation Testing Guide — Martin Fowler’s article
mutmut — Python mutation testing
go-mutesting — Go mutation testing
Stryker Dashboard — Track mutation scores over time
Stryker.NET — .NET mutation testing
Cosmic Ray — Python mutation testing engine

Introduction

How Mutation Testing Works

Key Terminology

JavaScript/TypeScript: Stryker

Setup

Configuration

Running

Advanced Stryker Configuration

Test Runner Configuration

Java: PIT

Setup with Maven

Setup with Gradle

Running

Mutation Operators

Operator Reference

Code Examples of Mutations

What Gets Mutated

Interpreting Results

Mutation Score Guide

Surviving Mutant Analysis

Equivalent Mutants

CI Integration

GitHub Actions

GitLab CI

Performance Optimization

Mutation Testing at Scale

Selective Mutation

Cost-Benefit Analysis

Time Budget Configuration

Language-Specific Tools

Python: mutmut Example

Go: go-mutesting Example

Best Practices

1. Set Realistic Thresholds

2. Run Incrementally

3. Focus on Business Logic

4. Review Surviving Mutants

5. Combine with Coverage

6. Integrate into Code Review

Common Pitfalls

1. Testing Only Happy Path

2. Asserting Too Generically

Resources

Comments

Share this article

👍 Was this article helpful?