Clausal Normal Form (CNF): From Formulas to Clauses

TL;DR: Clausal Normal Form (CNF) is a standardized formula format essential for SAT solvers and automated theorem proving. This guide covers the definition, conversion algorithms, practical examples, and applications.

Introduction

In propositional logic, different formula structures can represent the same logical meaning. Clausal Normal Form (CNF), also called Conjunctive Normal Form, standardizes formulas into a specific structure: a conjunction (AND) of disjunctions (OR).

This normalization is critical for:

SAT solvers that check formula satisfiability
Automated theorem proving via resolution
Hardware verification and model checking
Constraint satisfaction problems
AI planning and scheduling
Formal verification of software systems

Consider this analogy: just as JSON standardizes data exchange between systems, CNF standardizes logical formulas for automated reasoning tools. Without this standardization, every SAT solver would need to handle arbitrary formula structures, making implementation complex and inefficient.

Definitions

Literal

A literal is either a propositional variable or its negation:

Positive literal: P, Q, R
Negative literal: ¬P, ¬Q, ¬R

Example in Python:

class Literal:
    def __init__(self, variable, negated=False):
        self.variable = variable
        self.negated = negated
    
    def __repr__(self):
        return f"{'¬' if self.negated else ''}{self.variable}"

# Create literals
p = Literal('P')        # P
not_p = Literal('P', True)  # ¬P

Clause

A clause is a disjunction (OR) of literals:

P ∨ Q ∨ ¬R

Special cases:

Unit clause: Contains exactly one literal (e.g., P)
Binary clause: Contains exactly two literals (e.g., P ∨ Q)
Empty clause: Contains zero literals (⊥, always false)
Tautology: Contains both P and ¬P (always true, often removed)

Example in Python:

class Clause:
    def __init__(self, literals):
        self.literals = literals
    
    def is_unit(self):
        return len(self.literals) == 1
    
    def is_empty(self):
        return len(self.literals) == 0
    
    def __repr__(self):
        return ' ∨ '.join(str(lit) for lit in self.literals)

# Create clauses
clause1 = Clause([Literal('P'), Literal('Q')])  # P ∨ Q
unit = Clause([Literal('R', True)])  # ¬R (unit clause)

Clausal Normal Form (CNF)

A formula is in CNF if it is a conjunction (AND) of clauses:

(P ∨ Q) ∧ (¬P ∨ R) ∧ (¬Q ∨ ¬R)

Example in Python:

class CNF:
    def __init__(self, clauses):
        self.clauses = clauses
    
    def __repr__(self):
        return ' ∧ '.join(f"({c})" for c in self.clauses)

# Create CNF formula
cnf = CNF([
    Clause([Literal('P'), Literal('Q')]),
    Clause([Literal('P', True), Literal('R')]),
    Clause([Literal('Q', True), Literal('R', True)])
])
print(cnf)  # (P ∨ Q) ∧ (¬P ∨ R) ∧ (¬Q ∨ ¬R)

Why CNF Matters

Application	Why CNF is Used
SAT Solvers	Input format for DPLL and CDCL algorithms
Theorem Proving	Resolution works directly on clauses
Model Checking	Efficient state space traversal
Circuit Verification	Boolean circuit equivalence checking

Converting to CNF

Step-by-Step Algorithm

The standard CNF conversion algorithm follows these steps:

Step 1: Eliminate implications and biconditionals

P → Q  ≡  ¬P ∨ Q
P ↔ Q  ≡  (P → Q) ∧ (Q → P)  ≡  (¬P ∨ Q) ∧ (¬Q ∨ P)

Example in Python:

def eliminate_implications(formula):
    """Convert implications to disjunctions."""
    if formula['op'] == '→':
        left = eliminate_implications(formula['left'])
        right = eliminate_implications(formula['right'])
        return {'op': '∨', 'left': {'op': '¬', 'arg': left}, 'right': right}
    elif formula['op'] == '↔':
        left = eliminate_implications(formula['left'])
        right = eliminate_implications(formula['right'])
        # (P ↔ Q) ≡ (¬P ∨ Q) ∧ (¬Q ∨ P)
        return {
            'op': '∧',
            'left': {'op': '∨', 'left': {'op': '¬', 'arg': left}, 'right': right},
            'right': {'op': '∨', 'left': {'op': '¬', 'arg': right}, 'right': left}
        }
    return formula

Step 2: Move negation inward (using De Morgan’s laws)

¬(P ∧ Q)  ≡  ¬P ∨ ¬Q
¬(P ∨ Q)  ≡  ¬P ∧ ¬Q
¬¬P       ≡  P

Example in Python:

def push_negation_inward(formula):
    """Apply De Morgan's laws to push negations to literals."""
    if formula['op'] == '¬':
        arg = formula['arg']
        if arg['op'] == '¬':
            # ¬¬P ≡ P
            return push_negation_inward(arg['arg'])
        elif arg['op'] == '∧':
            # ¬(P ∧ Q) ≡ ¬P ∨ ¬Q
            return {
                'op': '∨',
                'left': push_negation_inward({'op': '¬', 'arg': arg['left']}),
                'right': push_negation_inward({'op': '¬', 'arg': arg['right']})
            }
        elif arg['op'] == '∨':
            # ¬(P ∨ Q) ≡ ¬P ∧ ¬Q
            return {
                'op': '∧',
                'left': push_negation_inward({'op': '¬', 'arg': arg['left']}),
                'right': push_negation_inward({'op': '¬', 'arg': arg['right']})
            }
    return formula

Step 3: Distribute OR over AND

P ∨ (Q ∧ R)  ≡  (P ∨ Q) ∧ (P ∨ R)
(P ∧ Q) ∨ R  ≡  (P ∨ R) ∧ (Q ∨ R)

Example in Python:

def distribute_or_over_and(formula):
    """Distribute OR over AND to achieve CNF."""
    if formula['op'] == '∨':
        left = distribute_or_over_and(formula['left'])
        right = distribute_or_over_and(formula['right'])
        
        # P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)
        if right['op'] == '∧':
            return {
                'op': '∧',
                'left': distribute_or_over_and({'op': '∨', 'left': left, 'right': right['left']}),
                'right': distribute_or_over_and({'op': '∨', 'left': left, 'right': right['right']})
            }
        # (P ∧ Q) ∨ R ≡ (P ∨ R) ∧ (Q ∨ R)
        elif left['op'] == '∧':
            return {
                'op': '∧',
                'left': distribute_or_over_and({'op': '∨', 'left': left['left'], 'right': right}),
                'right': distribute_or_over_and({'op': '∨', 'left': left['right'], 'right': right})
            }
    return formula

Complete CNF Conversion:

def to_cnf(formula):
    """Convert formula to CNF."""
    formula = eliminate_implications(formula)
    formula = push_negation_inward(formula)
    formula = distribute_or_over_and(formula)
    return formula

Examples

Example 1: Simple Formula

Convert: P → Q

Step 1: Remove implication

¬P ∨ Q

This is already in CNF (one clause: ¬P ∨ Q).

Example 2: With AND

Convert: P ∧ (P → Q)

Step 1: Remove implication

P ∧ (¬P ∨ Q)

Step 2: Already has negation pushed inward.

Step 3: Already in CNF:

Clause 1: P
Clause 2: ¬P ∨ Q

Example 3: Complex Formula

Convert: ¬(P ∨ Q) → R

Step 1: Remove implication

¬¬(P ∨ Q) ∨ R
≡ (P ∨ Q) ∨ R
≡ P ∨ Q ∨ R

Result: Single clause (P ∨ Q ∨ R) ✓

Example 4: Biconditional

Convert: (P ↔ Q) ∧ ¬R

Step 1: Remove ↔ (biconditional)

(P → Q) ∧ (Q → P) ∧ ¬R
≡ (¬P ∨ Q) ∧ (¬Q ∨ P) ∧ ¬R

Result: Three clauses:

(¬P ∨ Q)
(¬Q ∨ P)
¬R

Example 5: With Distribution

Convert: (P ∧ Q) ∨ R

Step 3: Distribute OR over AND

(P ∨ R) ∧ (Q ∨ R)

Result: Two clauses in CNF.

Example 6: Nested Formula

Convert: ¬((P → Q) ∧ (Q → R))

Step 1: Remove implications

¬((¬P ∨ Q) ∧ (¬Q ∨ R))

Step 2: Apply De Morgan’s law

¬(¬P ∨ Q) ∨ ¬(¬Q ∨ R)
≡ (¬¬P ∧ ¬Q) ∨ (¬¬Q ∧ ¬R)
≡ (P ∧ ¬Q) ∨ (Q ∧ ¬R)

Step 3: Distribute OR over AND

(P ∨ Q) ∧ (P ∨ ¬R) ∧ (¬Q ∨ Q) ∧ (¬Q ∨ ¬R)

Simplify (remove tautology ¬Q ∨ Q):

(P ∨ Q) ∧ (P ∨ ¬R) ∧ (¬Q ∨ ¬R)

Example 7: Three-Variable Formula

Convert: (P ∨ Q) → (R ∧ S)

Step 1: Remove implication

¬(P ∨ Q) ∨ (R ∧ S)

Step 2: Push negation inward

(¬P ∧ ¬Q) ∨ (R ∧ S)

Step 3: Distribute OR over AND

((¬P ∧ ¬Q) ∨ R) ∧ ((¬P ∧ ¬Q) ∨ S)
≡ (¬P ∨ R) ∧ (¬Q ∨ R) ∧ (¬P ∨ S) ∧ (¬Q ∨ S)

Result: Four clauses in CNF.

Python implementation:

# Example: Convert (P ∨ Q) → (R ∧ S)
formula = {
    'op': '→',
    'left': {'op': '∨', 'left': {'var': 'P'}, 'right': {'var': 'Q'}},
    'right': {'op': '∧', 'left': {'var': 'R'}, 'right': {'var': 'S'}}
}

cnf_formula = to_cnf(formula)
# Result: (¬P ∨ R) ∧ (¬Q ∨ R) ∧ (¬P ∨ S) ∧ (¬Q ∨ S)

Tseitin Transformation

The Tseitin transformation is a more efficient way to convert formulas to CNF by introducing fresh variables for subformulas. This avoids the exponential blowup that can occur with naive distribution.

Why Tseitin?

Consider converting (A₁ ∧ B₁) ∨ (A₂ ∧ B₂) ∨ ... ∨ (Aₙ ∧ Bₙ):

Naive approach: Results in 2ⁿ clauses (exponential)
Tseitin approach: Results in O(n) clauses (linear)

Tseitin Encoding Rules

For each subformula, introduce a new variable and add equivalence clauses:

For conjunction (X ↔ (A ∧ B)):

(X → (A ∧ B)) ∧ ((A ∧ B) → X)
≡ (¬X ∨ A) ∧ (¬X ∨ B) ∧ (¬A ∨ ¬B ∨ X)

For disjunction (X ↔ (A ∨ B)):

(X → (A ∨ B)) ∧ ((A ∨ B) → X)
≡ (¬X ∨ A ∨ B) ∧ (¬A ∨ X) ∧ (¬B ∨ X)

For negation (X ↔ ¬A):

(¬X ∨ ¬A) ∧ (A ∨ X)

Example: Tseitin Transformation

Convert: (P ∧ Q) ∨ (R ∧ S)

Step 1: Introduce variables

Let X₁ ≡ P ∧ Q
Let X₂ ≡ R ∧ S
Let X₃ ≡ X₁ ∨ X₂ (the root)

Step 2: Add equivalence clauses

For X₁ ↔ (P ∧ Q):

(¬X₁ ∨ P) ∧ (¬X₁ ∨ Q) ∧ (¬P ∨ ¬Q ∨ X₁)

For X₂ ↔ (R ∧ S):

(¬X₂ ∨ R) ∧ (¬X₂ ∨ S) ∧ (¬R ∨ ¬S ∨ X₂)

For X₃ ↔ (X₁ ∨ X₂):

(¬X₃ ∨ X₁ ∨ X₂) ∧ (¬X₁ ∨ X₃) ∧ (¬X₂ ∨ X₃)

Step 3: Assert root variable

X₃

Final CNF (9 clauses total):

X₃ ∧
(¬X₁ ∨ P) ∧ (¬X₁ ∨ Q) ∧ (¬P ∨ ¬Q ∨ X₁) ∧
(¬X₂ ∨ R) ∧ (¬X₂ ∨ S) ∧ (¬R ∨ ¬S ∨ X₂) ∧
(¬X₃ ∨ X₁ ∨ X₂) ∧ (¬X₁ ∨ X₃) ∧ (¬X₂ ∨ X₃)

Python implementation:

class TseitinTransformer:
    def __init__(self):
        self.var_counter = 0
        self.clauses = []
    
    def new_var(self):
        self.var_counter += 1
        return f"X{self.var_counter}"
    
    def transform(self, formula):
        """Transform formula to CNF using Tseitin."""
        root_var = self.tseitin_encode(formula)
        # Assert root variable
        self.clauses.append([root_var])
        return self.clauses
    
    def tseitin_encode(self, formula):
        """Recursively encode formula."""
        if 'var' in formula:
            return formula['var']
        
        op = formula['op']
        
        if op == '∧':
            left_var = self.tseitin_encode(formula['left'])
            right_var = self.tseitin_encode(formula['right'])
            new_var = self.new_var()
            
            # X ↔ (A ∧ B)
            self.clauses.append([f"¬{new_var}", left_var])
            self.clauses.append([f"¬{new_var}", right_var])
            self.clauses.append([f"¬{left_var}", f"¬{right_var}", new_var])
            
            return new_var
        
        elif op == '∨':
            left_var = self.tseitin_encode(formula['left'])
            right_var = self.tseitin_encode(formula['right'])
            new_var = self.new_var()
            
            # X ↔ (A ∨ B)
            self.clauses.append([f"¬{new_var}", left_var, right_var])
            self.clauses.append([f"¬{left_var}", new_var])
            self.clauses.append([f"¬{right_var}", new_var])
            
            return new_var
        
        elif op == '¬':
            arg_var = self.tseitin_encode(formula['arg'])
            new_var = self.new_var()
            
            # X ↔ ¬A
            self.clauses.append([f"¬{new_var}", f"¬{arg_var}"])
            self.clauses.append([arg_var, new_var])
            
            return new_var

# Example usage
transformer = TseitinTransformer()
formula = {
    'op': '∨',
    'left': {'op': '∧', 'left': {'var': 'P'}, 'right': {'var': 'Q'}},
    'right': {'op': '∧', 'left': {'var': 'R'}, 'right': {'var': 'S'}}
}
cnf_clauses = transformer.transform(formula)

Advantages

Aspect	Naive CNF	Tseitin
Size	Exponential (worst case)	Linear (always)
Variables	Original only	Additional introduced
Preserves	Logical equivalence	Equisatisfiability
Use case	Small formulas	Large formulas

Note: Tseitin preserves satisfiability but not equivalence. The new formula is satisfiable if and only if the original is, but they may have different models due to the auxiliary variables.

CNF in SAT Solvers

DIMACS Format

Modern SAT solvers like CDCL (Conflict-Driven Clause Learning) work on CNF. The standard input format is DIMACS:

c This is a comment
c Variables: 1=P, 2=Q, 3=R
p cnf 3 5
-1 -2 0
1 -2 0
-1 2 0
1 -3 0
-1 3 0

Format explanation:

p cnf 3 5: Problem line (3 variables, 5 clauses)
Each line is a clause: literals separated by spaces, terminated by 0
Positive number = positive literal, negative = negated literal
-1 -2 0 represents (¬P ∨ ¬Q)

This represents:

(¬P ∨ ¬Q) ∧ (P ∨ ¬Q) ∧ (¬P ∨ Q) ∧ (P ∨ ¬R) ∧ (¬P ∨ R)

Python DIMACS writer:

def write_dimacs(cnf_clauses, num_vars, filename):
    """Write CNF to DIMACS format."""
    with open(filename, 'w') as f:
        f.write(f"p cnf {num_vars} {len(cnf_clauses)}\n")
        for clause in cnf_clauses:
            clause_str = ' '.join(str(lit) for lit in clause) + ' 0\n'
            f.write(clause_str)

# Example
clauses = [[-1, -2], [1, -2], [-1, 2], [1, -3], [-1, 3]]
write_dimacs(clauses, 3, 'formula.cnf')

Python DIMACS reader:

def read_dimacs(filename):
    """Read CNF from DIMACS format."""
    clauses = []
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('c') or line.startswith('p'):
                continue
            clause = [int(x) for x in line.strip().split() if x != '0']
            if clause:
                clauses.append(clause)
    return clauses

# Example
cnf = read_dimacs('formula.cnf')
print(cnf)  # [[-1, -2], [1, -2], [-1, 2], [1, -3], [-1, 3]]

Unit Propagation

In CNF, unit clauses force variable assignments:

(P) ∧ (¬P ∨ Q) ∧ (¬Q ∨ R)

Propagation steps:

From (P), we derive P = true
Substitute into (¬P ∨ Q): becomes (false ∨ Q) = (Q)
From (Q), we derive Q = true
Substitute into (¬Q ∨ R): becomes (false ∨ R) = (R)
From (R), we derive R = true

Python implementation:

def unit_propagate(clauses, assignment=None):
    """Perform unit propagation on CNF clauses."""
    if assignment is None:
        assignment = {}
    
    changed = True
    while changed:
        changed = False
        for clause in clauses:
            # Filter out satisfied literals
            unassigned = [lit for lit in clause 
                         if abs(lit) not in assignment]
            
            # Check if clause is unit
            if len(unassigned) == 1:
                lit = unassigned[0]
                var = abs(lit)
                value = lit > 0
                
                if var not in assignment:
                    assignment[var] = value
                    changed = True
                    print(f"Unit propagation: {var} = {value}")
    
    return assignment

# Example
clauses = [[1], [-1, 2], [-2, 3]]
result = unit_propagate(clauses)
print(result)  # {1: True, 2: True, 3: True}

Pure Literal Elimination

A pure literal appears with only one polarity in all clauses. Pure literals can be assigned to satisfy all clauses containing them.

(P ∨ Q) ∧ (P ∨ ¬R) ∧ (Q ∨ R)

Here, P is pure (always positive), so we can set P = true.

Python implementation:

def find_pure_literals(clauses):
    """Find pure literals in CNF."""
    positive = set()
    negative = set()
    
    for clause in clauses:
        for lit in clause:
            if lit > 0:
                positive.add(lit)
            else:
                negative.add(-lit)
    
    # Pure literals appear in only one polarity
    pure = (positive - negative) | (negative - positive)
    return pure

# Example
clauses = [[1, 2], [1, -3], [2, 3]]
pure = find_pure_literals(clauses)
print(pure)  # {1} (P is pure positive)

Horn Clauses

A Horn clause contains at most one positive literal. Horn clauses are important because they can be solved in polynomial time.

Types of Horn Clauses

Definite clause (exactly one positive literal):

¬P₁ ∨ ¬P₂ ∨ ... ∨ ¬Pₙ ∨ Q

Interpreted as: “if P₁ and P₂ and … and Pₙ then Q”

Goal clause (no positive literals):

¬P₁ ∨ ¬P₂ ∨ ... ∨ ¬Pₙ

Interpreted as: “not (P₁ and P₂ and … and Pₙ)”

Fact (one positive literal, no negative):

Interpreted as: “Q is true”

Example: Horn Formula

(¬A ∨ ¬B ∨ C) ∧ (¬C ∨ D) ∧ (A) ∧ (B)

Interpretation:

If A and B then C
If C then D
A is true
B is true

Forward chaining solution:

A = true (fact)
B = true (fact)
C = true (from A ∧ B → C)
D = true (from C → D)

Python Horn clause solver:

def solve_horn(clauses):
    """Solve Horn formula using forward chaining."""
    assignment = {}
    facts = []
    rules = []
    
    # Separate facts from rules
    for clause in clauses:
        positive = [lit for lit in clause if lit > 0]
        negative = [abs(lit) for lit in clause if lit < 0]
        
        if len(positive) == 1 and len(negative) == 0:
            # Fact
            facts.append(positive[0])
        elif len(positive) == 1:
            # Rule: negative literals → positive literal
            rules.append((negative, positive[0]))
    
    # Initialize with facts
    for fact in facts:
        assignment[fact] = True
    
    # Forward chaining
    changed = True
    while changed:
        changed = False
        for premises, conclusion in rules:
            if conclusion not in assignment:
                # Check if all premises are satisfied
                if all(p in assignment and assignment[p] for p in premises):
                    assignment[conclusion] = True
                    changed = True
                    print(f"Derived: {conclusion} = True")
    
    return assignment

# Example: (¬1 ∨ ¬2 ∨ 3) ∧ (¬3 ∨ 4) ∧ (1) ∧ (2)
clauses = [[-1, -2, 3], [-3, 4], [1], [2]]
result = solve_horn(clauses)
print(result)  # {1: True, 2: True, 3: True, 4: True}

Why Horn Clauses Matter

Property	General CNF	Horn CNF
Satisfiability	NP-complete	P (polynomial)
Algorithm	DPLL/CDCL	Forward chaining
Use cases	General SAT	Logic programming, expert systems
Example systems	SAT solvers	Prolog, Datalog

Practical Applications

1. Circuit Verification

Hardware circuits can be encoded as CNF formulas to verify equivalence or find bugs.

Example: XOR gate verification

def xor_to_cnf(a, b, out):
    """Encode XOR gate as CNF: out = a ⊕ b"""
    # out ↔ (a ⊕ b)
    # out ↔ ((a ∨ b) ∧ ¬(a ∧ b))
    # out ↔ ((a ∨ b) ∧ (¬a ∨ ¬b))
    
    clauses = [
        # out → ((a ∨ b) ∧ (¬a ∨ ¬b))
        [-out, a, b],
        [-out, -a, -b],
        # ((a ∨ b) ∧ (¬a ∨ ¬b)) → out
        [-a, -b, out],
        [a, b, out]
    ]
    return clauses

# Verify two XOR implementations are equivalent
xor1_clauses = xor_to_cnf(1, 2, 3)  # out1 = a ⊕ b
xor2_clauses = xor_to_cnf(1, 2, 4)  # out2 = a ⊕ b
equivalence = [[-3, 4], [3, -4]]    # out1 ↔ out2

all_clauses = xor1_clauses + xor2_clauses + equivalence
# If UNSAT, circuits are equivalent

2. Planning Problems

AI planning problems can be encoded as CNF to find action sequences.

Example: Block world planning

def encode_block_world(initial, goal, steps):
    """Encode block world planning as CNF."""
    clauses = []
    
    # Variables: on(block, location, time)
    # Actions: move(block, from, to, time)
    
    # Initial state
    for fact in initial:
        clauses.append([fact])
    
    # Goal state (at final time step)
    for fact in goal:
        clauses.append([fact])
    
    # Frame axioms: if block doesn't move, position stays same
    # Action preconditions and effects
    # Mutual exclusion: block can't be in two places
    
    return clauses

# Example encoding
initial = [1, 2]  # on(A, table, 0), on(B, table, 0)
goal = [3]        # on(A, B, 2)
clauses = encode_block_world(initial, goal, steps=2)

3. Cryptography

SAT solvers can attack cryptographic problems by encoding them as CNF.

Example: Hash collision search

def encode_hash_collision(hash_function, target_hash):
    """Encode hash collision problem as CNF."""
    clauses = []
    
    # Variables for input bits
    input_vars = list(range(1, 257))  # 256-bit input
    
    # Encode hash function as circuit
    hash_output = encode_circuit(hash_function, input_vars)
    
    # Constrain output to match target
    for i, bit in enumerate(target_hash):
        if bit == 1:
            clauses.append([hash_output[i]])
        else:
            clauses.append([-hash_output[i]])
    
    return clauses

4. Sudoku Solver

Sudoku can be elegantly encoded as CNF.

Example: Sudoku encoding

def encode_sudoku(grid):
    """Encode 9x9 Sudoku as CNF."""
    clauses = []
    
    # Variable: x_ijk means cell (i,j) has value k
    # Variable numbering: 100*i + 10*j + k
    
    def var(i, j, k):
        return 100*i + 10*j + k
    
    # Each cell has at least one value
    for i in range(1, 10):
        for j in range(1, 10):
            clauses.append([var(i, j, k) for k in range(1, 10)])
    
    # Each cell has at most one value
    for i in range(1, 10):
        for j in range(1, 10):
            for k1 in range(1, 10):
                for k2 in range(k1+1, 10):
                    clauses.append([-var(i, j, k1), -var(i, j, k2)])
    
    # Each row has all values
    for i in range(1, 10):
        for k in range(1, 10):
            clauses.append([var(i, j, k) for j in range(1, 10)])
    
    # Each column has all values
    for j in range(1, 10):
        for k in range(1, 10):
            clauses.append([var(i, j, k) for i in range(1, 10)])
    
    # Each 3x3 box has all values
    for box_i in range(3):
        for box_j in range(3):
            for k in range(1, 10):
                clause = []
                for i in range(1, 4):
                    for j in range(1, 4):
                        clause.append(var(3*box_i + i, 3*box_j + j, k))
                clauses.append(clause)
    
    # Add given values
    for i in range(1, 10):
        for j in range(1, 10):
            if grid[i-1][j-1] != 0:
                k = grid[i-1][j-1]
                clauses.append([var(i, j, k)])
    
    return clauses

# Example usage
grid = [
    [5, 3, 0, 0, 7, 0, 0, 0, 0],
    [6, 0, 0, 1, 9, 5, 0, 0, 0],
    # ... rest of grid
]
sudoku_cnf = encode_sudoku(grid)

5. Graph Coloring

The graph k-coloring problem: assign k colors to vertices such that no adjacent vertices share a color.

Example: 3-coloring encoding

def encode_graph_coloring(edges, num_vertices, num_colors):
    """Encode graph k-coloring as CNF."""
    clauses = []
    
    # Variable: x_vc means vertex v has color c
    def var(v, c):
        return v * num_colors + c + 1
    
    # Each vertex has at least one color
    for v in range(num_vertices):
        clauses.append([var(v, c) for c in range(num_colors)])
    
    # Each vertex has at most one color
    for v in range(num_vertices):
        for c1 in range(num_colors):
            for c2 in range(c1+1, num_colors):
                clauses.append([-var(v, c1), -var(v, c2)])
    
    # Adjacent vertices have different colors
    for v1, v2 in edges:
        for c in range(num_colors):
            clauses.append([-var(v1, c), -var(v2, c)])
    
    return clauses

# Example: Triangle graph with 3 colors
edges = [(0, 1), (1, 2), (2, 0)]
cnf = encode_graph_coloring(edges, 3, 3)

Common Pitfalls and Best Practices

Pitfall 1: Exponential Blowup

Bad: Naively distributing OR over AND

# (A₁ ∧ B₁) ∨ (A₂ ∧ B₂) ∨ ... ∨ (Aₙ ∧ Bₙ)
# Results in 2ⁿ clauses!

Good: Use Tseitin transformation

# Introduces O(n) auxiliary variables
# Results in O(n) clauses
transformer = TseitinTransformer()
cnf = transformer.transform(formula)

Pitfall 2: Not Simplifying Tautologies

Bad: Keep tautological clauses

clauses = [[1, -1, 2], [3, 4]]  # First clause is always true

Good: Remove tautologies

def remove_tautologies(clauses):
    """Remove clauses that are always true."""
    result = []
    for clause in clauses:
        literals = set(clause)
        # Check if both P and ¬P exist
        is_tautology = any(-lit in literals for lit in literals)
        if not is_tautology:
            result.append(clause)
    return result

Pitfall 3: Duplicate Literals

Bad: Keep duplicate literals in clauses

clause = [1, 2, 1, 3]  # P ∨ Q ∨ P ∨ R

Good: Remove duplicates

def simplify_clause(clause):
    """Remove duplicate literals."""
    return list(set(clause))

Best Practice: Incremental CNF Construction

Build CNF incrementally for better performance:

class IncrementalCNF:
    def __init__(self):
        self.clauses = []
        self.var_counter = 0
    
    def new_var(self):
        self.var_counter += 1
        return self.var_counter
    
    def add_clause(self, clause):
        """Add clause with simplification."""
        clause = list(set(clause))  # Remove duplicates
        
        # Check for tautology
        if any(-lit in clause for lit in clause):
            return  # Don't add tautology
        
        self.clauses.append(clause)
    
    def add_and(self, a, b):
        """Add clauses for c ↔ (a ∧ b)."""
        c = self.new_var()
        self.add_clause([-c, a])
        self.add_clause([-c, b])
        self.add_clause([-a, -b, c])
        return c
    
    def add_or(self, a, b):
        """Add clauses for c ↔ (a ∨ b)."""
        c = self.new_var()
        self.add_clause([-c, a, b])
        self.add_clause([-a, c])
        self.add_clause([-b, c])
        return c

Summary

Concept	Definition
Literal	`P` or `¬P`
Clause	Disjunction of literals: `P ∨ Q ∨ ¬R`
CNF	Conjunction of clauses
Empty clause	⊥ (always false)
Unit clause	Single literal
Horn clause	At most one positive literal
Tautology	Clause with both `P` and `¬P`

Key Points:

CNF standardizes formula structure for algorithmic processing
Conversion can be done naively (potentially exponential) or via Tseitin transformation (linear)
SAT solvers operate on CNF formulas using DIMACS format
Horn clauses are a tractable subset solvable in polynomial time
Unit propagation and pure literal elimination are key simplification techniques
Many real-world problems (Sudoku, graph coloring, planning) can be encoded as CNF

Clausal Normal Form (CNF): From Formulas to Clauses

Introduction

Definitions

Literal

Clause

Clausal Normal Form (CNF)

Why CNF Matters

Converting to CNF

Step-by-Step Algorithm

Examples

Example 1: Simple Formula

Example 2: With AND

Example 3: Complex Formula

Example 4: Biconditional

Example 5: With Distribution

Example 6: Nested Formula

Example 7: Three-Variable Formula

Tseitin Transformation

Why Tseitin?

Tseitin Encoding Rules

Example: Tseitin Transformation

Advantages

CNF in SAT Solvers

DIMACS Format

Unit Propagation

Pure Literal Elimination

Horn Clauses

Types of Horn Clauses

Example: Horn Formula

Why Horn Clauses Matter

Practical Applications

1. Circuit Verification

2. Planning Problems

3. Cryptography

4. Sudoku Solver

5. Graph Coloring

Common Pitfalls and Best Practices

Pitfall 1: Exponential Blowup

Pitfall 2: Not Simplifying Tautologies

Pitfall 3: Duplicate Literals

Best Practice: Incremental CNF Construction

Summary

Further Reading

Comments