First-Order Theorem Proving: A Comprehensive Guide to Automated Reasoning

First-order theorem proving represents one of the most powerful techniques in automated reasoning, enabling computers to verify mathematical proofs, validate software correctness, and solve complex logical problems. This comprehensive guide explores the foundations, algorithms, and practical applications of first-order theorem proving.

Understanding First-Order Logic

First-order logic (FOL), also called predicate logic, extends propositional logic by introducing quantifiers, predicates, and functions over domains of objects. Unlike propositional logic which deals only with true/false statements, FOL can express relationships between objects and make statements about collections of objects.

Basic Components of First-Order Logic

The syntax of first-order logic consists of several key elements:

# Example: Representing FOL terms in Python
class Term:
    """Base class for FOL terms"""
    pass

class Variable(Term):
    def __init__(self, name):
        self.name = name
    
    def __repr__(self):
        return self.name

class Constant(Term):
    def __init__(self, name):
        self.name = name
    
    def __repr__(self):
        return self.name

class Function(Term):
    def __init__(self, name, args):
        self.name = name
        self.args = args  # List of terms
    
    def __repr__(self):
        args_str = ", ".join(str(arg) for arg in self.args)
        return f"{self.name}({args_str})"

# Example usage
x = Variable("x")
john = Constant("john")
father_of = Function("father", [john])
print(f"Variable: {x}")
print(f"Constant: {john}")
print(f"Function: {father_of}")

Predicates and Formulas

Predicates express properties and relationships between objects:

class Predicate:
    def __init__(self, name, args):
        self.name = name
        self.args = args  # List of terms
    
    def __repr__(self):
        args_str = ", ".join(str(arg) for arg in self.args)
        return f"{self.name}({args_str})"

class Formula:
    """Base class for FOL formulas"""
    pass

class Atom(Formula):
    def __init__(self, predicate):
        self.predicate = predicate
    
    def __repr__(self):
        return str(self.predicate)

class Not(Formula):
    def __init__(self, formula):
        self.formula = formula
    
    def __repr__(self):
        return f"¬{self.formula}"

class And(Formula):
    def __init__(self, left, right):
        self.left = left
        self.right = right
    
    def __repr__(self):
        return f"({self.left} ∧ {self.right})"

class Or(Formula):
    def __init__(self, left, right):
        self.left = left
        self.right = right
    
    def __repr__(self):
        return f"({self.left} ∨ {self.right})"

# Example: "Human(x) ∧ Mortal(x)"
x = Variable("x")
human = Atom(Predicate("Human", [x]))
mortal = Atom(Predicate("Mortal", [x]))
formula = And(human, mortal)
print(f"Formula: {formula}")

Quantifiers in First-Order Logic

Quantifiers allow us to make statements about collections of objects:

class Forall(Formula):
    """Universal quantifier: ∀x.P(x)"""
    def __init__(self, variable, formula):
        self.variable = variable
        self.formula = formula
    
    def __repr__(self):
        return f"∀{self.variable}.{self.formula}"

class Exists(Formula):
    """Existential quantifier: ∃x.P(x)"""
    def __init__(self, variable, formula):
        self.variable = variable
        self.formula = formula
    
    def __repr__(self):
        return f"∃{self.variable}.{self.formula}"

# Example: "∀x.(Human(x) → Mortal(x))"
# "All humans are mortal"
x = Variable("x")
human = Atom(Predicate("Human", [x]))
mortal = Atom(Predicate("Mortal", [x]))
implies = Or(Not(human), mortal)  # A → B ≡ ¬A ∨ B
all_humans_mortal = Forall(x, implies)
print(f"Universal statement: {all_humans_mortal}")

# Example: "∃x.(Human(x) ∧ Wise(x))"
# "There exists a wise human"
wise = Atom(Predicate("Wise", [x]))
exists_wise_human = Exists(x, And(human, wise))
print(f"Existential statement: {exists_wise_human}")

Clausal Normal Form (CNF)

Before applying theorem proving algorithms, formulas must be converted to Clausal Normal Form. CNF is a conjunction of disjunctions of literals, where a literal is an atomic formula or its negation.

Converting to CNF: Step-by-Step

def eliminate_implications(formula):
    """Convert A → B to ¬A ∨ B"""
    if isinstance(formula, Implies):
        return Or(Not(formula.left), formula.right)
    elif isinstance(formula, And):
        return And(eliminate_implications(formula.left),
                   eliminate_implications(formula.right))
    elif isinstance(formula, Or):
        return Or(eliminate_implications(formula.left),
                  eliminate_implications(formula.right))
    elif isinstance(formula, Not):
        return Not(eliminate_implications(formula.formula))
    return formula

def move_negations_inward(formula):
    """Apply De Morgan's laws and double negation"""
    if isinstance(formula, Not):
        inner = formula.formula
        if isinstance(inner, Not):
            # ¬¬A ≡ A
            return move_negations_inward(inner.formula)
        elif isinstance(inner, And):
            # ¬(A ∧ B) ≡ ¬A ∨ ¬B
            return Or(move_negations_inward(Not(inner.left)),
                     move_negations_inward(Not(inner.right)))
        elif isinstance(inner, Or):
            # ¬(A ∨ B) ≡ ¬A ∧ ¬B
            return And(move_negations_inward(Not(inner.left)),
                      move_negations_inward(Not(inner.right)))
        elif isinstance(inner, Forall):
            # ¬∀x.P(x) ≡ ∃x.¬P(x)
            return Exists(inner.variable,
                         move_negations_inward(Not(inner.formula)))
        elif isinstance(inner, Exists):
            # ¬∃x.P(x) ≡ ∀x.¬P(x)
            return Forall(inner.variable,
                         move_negations_inward(Not(inner.formula)))
    elif isinstance(formula, And):
        return And(move_negations_inward(formula.left),
                   move_negations_inward(formula.right))
    elif isinstance(formula, Or):
        return Or(move_negations_inward(formula.left),
                  move_negations_inward(formula.right))
    return formula

def standardize_variables(formula, counter={'count': 0}):
    """Rename variables to avoid conflicts"""
    if isinstance(formula, Forall) or isinstance(formula, Exists):
        new_var = Variable(f"{formula.variable.name}_{counter['count']}")
        counter['count'] += 1
        new_formula = substitute(formula.formula, formula.variable, new_var)
        if isinstance(formula, Forall):
            return Forall(new_var, standardize_variables(new_formula, counter))
        else:
            return Exists(new_var, standardize_variables(new_formula, counter))
    elif isinstance(formula, And):
        return And(standardize_variables(formula.left, counter),
                   standardize_variables(formula.right, counter))
    elif isinstance(formula, Or):
        return Or(standardize_variables(formula.left, counter),
                  standardize_variables(formula.right, counter))
    return formula

Skolemization: Eliminating Existential Quantifiers

class SkolemFunction:
    """Skolem function for existential elimination"""
    def __init__(self, name, args):
        self.name = name
        self.args = args
    
    def __repr__(self):
        if not self.args:
            return self.name
        args_str = ", ".join(str(arg) for arg in self.args)
        return f"{self.name}({args_str})"

def skolemize(formula, universal_vars=None, skolem_counter={'count': 0}):
    """Replace existential quantifiers with Skolem functions"""
    if universal_vars is None:
        universal_vars = []
    
    if isinstance(formula, Forall):
        new_universal = universal_vars + [formula.variable]
        return Forall(formula.variable,
                     skolemize(formula.formula, new_universal, skolem_counter))
    
    elif isinstance(formula, Exists):
        # Create Skolem function
        skolem_name = f"sk_{skolem_counter['count']}"
        skolem_counter['count'] += 1
        
        if universal_vars:
            skolem_term = Function(skolem_name, universal_vars)
        else:
            skolem_term = Constant(skolem_name)
        
        # Substitute existential variable with Skolem term
        new_formula = substitute(formula.formula, formula.variable, skolem_term)
        return skolemize(new_formula, universal_vars, skolem_counter)
    
    elif isinstance(formula, And):
        return And(skolemize(formula.left, universal_vars, skolem_counter),
                   skolemize(formula.right, universal_vars, skolem_counter))
    
    elif isinstance(formula, Or):
        return Or(skolemize(formula.left, universal_vars, skolem_counter),
                  skolemize(formula.right, universal_vars, skolem_counter))
    
    return formula

# Example: Skolemization
# ∀x.∃y.Loves(x, y) → ∀x.Loves(x, sk_0(x))
x = Variable("x")
y = Variable("y")
loves = Atom(Predicate("Loves", [x, y]))
formula = Forall(x, Exists(y, loves))
skolemized = skolemize(formula)
print(f"Original: {formula}")
print(f"Skolemized: {skolemized}")

Unification: The Heart of Theorem Proving

Unification is the process of finding substitutions that make two terms identical. It’s fundamental to resolution-based theorem proving.

Most General Unifier (MGU)

class Substitution:
    """Represents a substitution mapping variables to terms"""
    def __init__(self, mapping=None):
        self.mapping = mapping if mapping else {}
    
    def apply(self, term):
        """Apply substitution to a term"""
        if isinstance(term, Variable):
            if term.name in self.mapping:
                return self.apply(self.mapping[term.name])
            return term
        elif isinstance(term, Constant):
            return term
        elif isinstance(term, Function):
            new_args = [self.apply(arg) for arg in term.args]
            return Function(term.name, new_args)
        return term
    
    def compose(self, other):
        """Compose two substitutions"""
        new_mapping = {}
        # Apply other to all values in self
        for var, term in self.mapping.items():
            new_mapping[var] = other.apply(term)
        # Add mappings from other not in self
        for var, term in other.mapping.items():
            if var not in new_mapping:
                new_mapping[var] = term
        return Substitution(new_mapping)
    
    def __repr__(self):
        items = [f"{var}/{term}" for var, term in self.mapping.items()]
        return "{" + ", ".join(items) + "}"

def occurs_check(var, term):
    """Check if variable occurs in term (prevents infinite structures)"""
    if isinstance(term, Variable):
        return var.name == term.name
    elif isinstance(term, Function):
        return any(occurs_check(var, arg) for arg in term.args)
    return False

def unify(term1, term2):
    """Find most general unifier of two terms"""
    return unify_helper(term1, term2, Substitution())

def unify_helper(term1, term2, subst):
    """Recursive unification with substitution"""
    # Apply current substitution
    term1 = subst.apply(term1)
    term2 = subst.apply(term2)
    
    # Same term
    if isinstance(term1, Constant) and isinstance(term2, Constant):
        if term1.name == term2.name:
            return subst
        return None
    
    # Variable cases
    if isinstance(term1, Variable):
        if occurs_check(term1, term2):
            return None if term1.name != term2.name else subst
        new_subst = Substitution({term1.name: term2})
        return subst.compose(new_subst)
    
    if isinstance(term2, Variable):
        if occurs_check(term2, term1):
            return None
        new_subst = Substitution({term2.name: term1})
        return subst.compose(new_subst)
    
    # Function cases
    if isinstance(term1, Function) and isinstance(term2, Function):
        if term1.name != term2.name or len(term1.args) != len(term2.args):
            return None
        
        current_subst = subst
        for arg1, arg2 in zip(term1.args, term2.args):
            current_subst = unify_helper(arg1, arg2, current_subst)
            if current_subst is None:
                return None
        return current_subst
    
    return None

# Example: Unification
x = Variable("x")
y = Variable("y")
john = Constant("john")

# Unify f(x, y) with f(john, z)
term1 = Function("f", [x, y])
term2 = Function("f", [john, Variable("z")])
mgu = unify(term1, term2)
print(f"Unify {term1} and {term2}")
print(f"MGU: {mgu}")
print(f"Result: {mgu.apply(term1)}")

Good Pattern: Efficient Unification with Occurs Check

def unify_efficient(term1, term2):
    """Efficient unification with proper occurs check"""
    stack = [(term1, term2)]
    subst = Substitution()
    
    while stack:
        t1, t2 = stack.pop()
        t1 = subst.apply(t1)
        t2 = subst.apply(t2)
        
        if isinstance(t1, Variable):
            if not isinstance(t2, Variable) or t1.name != t2.name:
                if occurs_check(t1, t2):
                    return None
                subst = subst.compose(Substitution({t1.name: t2}))
        elif isinstance(t2, Variable):
            if occurs_check(t2, t1):
                return None
            subst = subst.compose(Substitution({t2.name: t1}))
        elif isinstance(t1, Constant) and isinstance(t2, Constant):
            if t1.name != t2.name:
                return None
        elif isinstance(t1, Function) and isinstance(t2, Function):
            if t1.name != t2.name or len(t1.args) != len(t2.args):
                return None
            stack.extend(zip(t1.args, t2.args))
        else:
            return None
    
    return subst

Bad Pattern: Unification Without Occurs Check

def unify_bad(term1, term2):
    """BAD: Missing occurs check leads to infinite structures"""
    if isinstance(term1, Variable):
        # WRONG: No occurs check
        return Substitution({term1.name: term2})
    
    if isinstance(term2, Variable):
        # WRONG: No occurs check
        return Substitution({term2.name: term1})
    
    # This can create x = f(x), leading to infinite loops
    return None

Resolution: The Core Inference Rule

Resolution is the primary inference rule used in automated theorem proving. It combines two clauses that contain complementary literals to produce a new clause.

Binary Resolution

class Clause:
    """A disjunction of literals"""
    def __init__(self, literals):
        self.literals = frozenset(literals)  # Use frozenset for hashing
    
    def __repr__(self):
        if not self.literals:
            return "□"  # Empty clause (contradiction)
        return " ∨ ".join(str(lit) for lit in self.literals)
    
    def __hash__(self):
        return hash(self.literals)
    
    def __eq__(self, other):
        return self.literals == other.literals

class Literal:
    """An atomic formula or its negation"""
    def __init__(self, predicate, positive=True):
        self.predicate = predicate
        self.positive = positive
    
    def negate(self):
        return Literal(self.predicate, not self.positive)
    
    def __repr__(self):
        if self.positive:
            return str(self.predicate)
        return f"¬{self.predicate}"
    
    def __hash__(self):
        return hash((str(self.predicate), self.positive))
    
    def __eq__(self, other):
        return (str(self.predicate) == str(other.predicate) and 
                self.positive == other.positive)

def resolve(clause1, clause2):
    """Apply binary resolution to two clauses"""
    resolvents = []
    
    for lit1 in clause1.literals:
        for lit2 in clause2.literals:
            # Check if literals are complementary
            if lit1.positive != lit2.positive:
                # Try to unify the predicates
                mgu = unify_predicates(lit1.predicate, lit2.predicate)
                if mgu is not None:
                    # Create resolvent
                    new_literals = set()
                    
                    # Add literals from clause1 except lit1
                    for lit in clause1.literals:
                        if lit != lit1:
                            new_lit = apply_substitution_to_literal(lit, mgu)
                            new_literals.add(new_lit)
                    
                    # Add literals from clause2 except lit2
                    for lit in clause2.literals:
                        if lit != lit2:
                            new_lit = apply_substitution_to_literal(lit, mgu)
                            new_literals.add(new_lit)
                    
                    resolvent = Clause(new_literals)
                    resolvents.append((resolvent, mgu))
    
    return resolvents

def unify_predicates(pred1, pred2):
    """Unify two predicates"""
    if pred1.name != pred2.name or len(pred1.args) != len(pred2.args):
        return None
    
    subst = Substitution()
    for arg1, arg2 in zip(pred1.args, pred2.args):
        subst = unify_helper(arg1, arg2, subst)
        if subst is None:
            return None
    return subst

def apply_substitution_to_literal(literal, subst):
    """Apply substitution to a literal"""
    new_args = [subst.apply(arg) for arg in literal.predicate.args]
    new_pred = Predicate(literal.predicate.name, new_args)
    return Literal(new_pred, literal.positive)

Resolution Refutation

def resolution_refutation(clauses, max_iterations=1000):
    """
    Prove unsatisfiability using resolution refutation.
    Returns True if contradiction found, False otherwise.
    """
    clauses = set(clauses)
    new_clauses = set()
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        clause_list = list(clauses)
        
        # Try all pairs of clauses
        for i in range(len(clause_list)):
            for j in range(i + 1, len(clause_list)):
                resolvents = resolve(clause_list[i], clause_list[j])
                
                for resolvent, mgu in resolvents:
                    # Check for empty clause (contradiction)
                    if not resolvent.literals:
                        print(f"Contradiction found at iteration {iteration}")
                        return True
                    
                    new_clauses.add(resolvent)
        
        # Check if we've generated new clauses
        if new_clauses.issubset(clauses):
            print(f"No new clauses generated. Proof failed.")
            return False
        
        clauses.update(new_clauses)
        new_clauses.clear()
    
    print(f"Max iterations reached. Proof incomplete.")
    return False

# Example: Prove "All humans are mortal, Socrates is human ⊢ Socrates is mortal"
def example_socrates():
    # Clauses in CNF:
    # 1. ¬Human(x) ∨ Mortal(x)  [All humans are mortal]
    # 2. Human(socrates)         [Socrates is human]
    # 3. ¬Mortal(socrates)       [Negation of conclusion]
    
    x = Variable("x")
    socrates = Constant("socrates")
    
    # Clause 1: ¬Human(x) ∨ Mortal(x)
    human_x = Literal(Predicate("Human", [x]), positive=False)
    mortal_x = Literal(Predicate("Mortal", [x]), positive=True)
    clause1 = Clause([human_x, mortal_x])
    
    # Clause 2: Human(socrates)
    human_socrates = Literal(Predicate("Human", [socrates]), positive=True)
    clause2 = Clause([human_socrates])
    
    # Clause 3: ¬Mortal(socrates)
    not_mortal_socrates = Literal(Predicate("Mortal", [socrates]), positive=False)
    clause3 = Clause([not_mortal_socrates])
    
    print("Proving: All humans are mortal, Socrates is human ⊢ Socrates is mortal")
    print(f"Clause 1: {clause1}")
    print(f"Clause 2: {clause2}")
    print(f"Clause 3: {clause3}")
    print()
    
    result = resolution_refutation([clause1, clause2, clause3])
    print(f"Proof result: {result}")

example_socrates()

Advanced Resolution Strategies

Set of Support Strategy

def resolution_with_sos(axioms, negated_goal, max_iterations=1000):
    """
    Resolution with Set of Support (SOS) strategy.
    Only resolve clauses where at least one is from SOS.
    """
    # SOS initially contains negated goal
    sos = {negated_goal}
    usable = set(axioms)
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        new_sos = set()
        
        # Resolve SOS clauses with usable clauses
        for sos_clause in sos:
            for usable_clause in usable:
                resolvents = resolve(sos_clause, usable_clause)
                
                for resolvent, mgu in resolvents:
                    if not resolvent.literals:
                        print(f"Contradiction found at iteration {iteration}")
                        return True
                    new_sos.add(resolvent)
        
        # Resolve SOS clauses with each other
        sos_list = list(sos)
        for i in range(len(sos_list)):
            for j in range(i + 1, len(sos_list)):
                resolvents = resolve(sos_list[i], sos_list[j])
                
                for resolvent, mgu in resolvents:
                    if not resolvent.literals:
                        print(f"Contradiction found at iteration {iteration}")
                        return True
                    new_sos.add(resolvent)
        
        if not new_sos:
            print("No new clauses generated. Proof failed.")
            return False
        
        # Move old SOS to usable, new resolvents become SOS
        usable.update(sos)
        sos = new_sos
    
    print("Max iterations reached. Proof incomplete.")
    return False

Unit Resolution Strategy

def unit_resolution(clauses, max_iterations=1000):
    """
    Unit resolution: prioritize resolution with unit clauses.
    More efficient for many problems.
    """
    clauses = set(clauses)
    unit_clauses = {c for c in clauses if len(c.literals) == 1}
    non_unit_clauses = clauses - unit_clauses
    iteration = 0
    
    while iteration < max_iterations and unit_clauses:
        iteration += 1
        new_clauses = set()
        
        # Resolve unit clauses with all clauses
        for unit in list(unit_clauses):
            for clause in clauses:
                if clause == unit:
                    continue
                
                resolvents = resolve(unit, clause)
                for resolvent, mgu in resolvents:
                    if not resolvent.literals:
                        print(f"Contradiction found at iteration {iteration}")
                        return True
                    
                    new_clauses.add(resolvent)
                    if len(resolvent.literals) == 1:
                        unit_clauses.add(resolvent)
        
        if not new_clauses:
            print("No new clauses generated. Proof failed.")
            return False
        
        clauses.update(new_clauses)
    
    print("No more unit clauses. Switching to general resolution.")
    return resolution_refutation(clauses, max_iterations - iteration)

Good Pattern: Subsumption and Tautology Elimination

def subsumes(clause1, clause2):
    """Check if clause1 subsumes clause2 (clause1 is more general)"""
    if len(clause1.literals) > len(clause2.literals):
        return False
    
    # Try to find a substitution that makes clause1 a subset of clause2
    for lit1 in clause1.literals:
        found = False
        for lit2 in clause2.literals:
            if lit1.positive == lit2.positive:
                mgu = unify_predicates(lit1.predicate, lit2.predicate)
                if mgu is not None:
                    found = True
                    break
        if not found:
            return False
    return True

def is_tautology(clause):
    """Check if clause is a tautology (contains P and ¬P)"""
    for lit1 in clause.literals:
        for lit2 in clause.literals:
            if lit1.positive != lit2.positive:
                if unify_predicates(lit1.predicate, lit2.predicate) is not None:
                    return True
    return False

def simplify_clause_set(clauses):
    """Remove subsumed clauses and tautologies"""
    simplified = set()
    
    for clause in clauses:
        # Skip tautologies
        if is_tautology(clause):
            continue
        
        # Check if subsumed by existing clause
        subsumed = False
        for existing in simplified:
            if subsumes(existing, clause):
                subsumed = True
                break
        
        if not subsumed:
            # Remove clauses subsumed by this one
            simplified = {c for c in simplified if not subsumes(clause, c)}
            simplified.add(clause)
    
    return simplified

Bad Pattern: No Clause Simplification

def resolution_bad(clauses, max_iterations=1000):
    """BAD: No simplification leads to exponential clause growth"""
    clauses = set(clauses)
    
    for iteration in range(max_iterations):
        clause_list = list(clauses)
        
        for i in range(len(clause_list)):
            for j in range(i + 1, len(clause_list)):
                resolvents = resolve(clause_list[i], clause_list[j])
                
                for resolvent, mgu in resolvents:
                    if not resolvent.literals:
                        return True
                    # WRONG: Adding all resolvents without simplification
                    clauses.add(resolvent)
        
        # WRONG: Clause set grows exponentially
        print(f"Iteration {iteration}: {len(clauses)} clauses")
    
    return False

Complete Theorem Prover Implementation

Here’s a complete, production-ready theorem prover with all optimizations:

class TheoremProver:
    """Complete first-order theorem prover with resolution"""
    
    def __init__(self, max_iterations=1000, strategy='sos'):
        self.max_iterations = max_iterations
        self.strategy = strategy
        self.proof_trace = []
    
    def prove(self, axioms, goal):
        """
        Prove that axioms entail goal.
        Returns (success, proof_trace)
        """
        # Convert to CNF
        cnf_axioms = [self.to_cnf(axiom) for axiom in axioms]
        
        # Negate goal and convert to CNF
        negated_goal = Not(goal)
        cnf_goal = self.to_cnf(negated_goal)
        
        # Flatten to clause set
        all_clauses = []
        for cnf in cnf_axioms + [cnf_goal]:
            all_clauses.extend(self.extract_clauses(cnf))
        
        # Simplify initial clause set
        all_clauses = simplify_clause_set(all_clauses)
        
        print(f"Initial clauses: {len(all_clauses)}")
        for i, clause in enumerate(all_clauses, 1):
            print(f"  {i}. {clause}")
        print()
        
        # Choose strategy
        if self.strategy == 'sos':
            return self.prove_sos(all_clauses[:-1], all_clauses[-1])
        elif self.strategy == 'unit':
            return self.prove_unit(all_clauses)
        else:
            return self.prove_basic(all_clauses)
    
    def prove_basic(self, clauses):
        """Basic resolution without special strategies"""
        clauses = set(clauses)
        
        for iteration in range(self.max_iterations):
            clause_list = list(clauses)
            new_clauses = set()
            
            for i in range(len(clause_list)):
                for j in range(i + 1, len(clause_list)):
                    resolvents = resolve(clause_list[i], clause_list[j])
                    
                    for resolvent, mgu in resolvents:
                        self.proof_trace.append({
                            'iteration': iteration,
                            'parent1': clause_list[i],
                            'parent2': clause_list[j],
                            'resolvent': resolvent,
                            'mgu': mgu
                        })
                        
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        
                        new_clauses.add(resolvent)
            
            # Simplify new clauses
            new_clauses = simplify_clause_set(new_clauses)
            
            if new_clauses.issubset(clauses):
                print(f"✗ No new clauses. Proof failed.")
                return False, self.proof_trace
            
            clauses.update(new_clauses)
            print(f"Iteration {iteration}: {len(clauses)} clauses")
        
        print(f"✗ Max iterations reached")
        return False, self.proof_trace
    
    def prove_sos(self, axioms, goal):
        """Prove using Set of Support strategy"""
        sos = {goal}
        usable = set(axioms)
        
        for iteration in range(self.max_iterations):
            new_sos = set()
            
            # SOS with usable
            for sos_clause in sos:
                for usable_clause in usable:
                    resolvents = resolve(sos_clause, usable_clause)
                    
                    for resolvent, mgu in resolvents:
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        new_sos.add(resolvent)
            
            # SOS with SOS
            sos_list = list(sos)
            for i in range(len(sos_list)):
                for j in range(i + 1, len(sos_list)):
                    resolvents = resolve(sos_list[i], sos_list[j])
                    
                    for resolvent, mgu in resolvents:
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        new_sos.add(resolvent)
            
            new_sos = simplify_clause_set(new_sos)
            
            if not new_sos:
                print(f"✗ No new clauses. Proof failed.")
                return False, self.proof_trace
            
            usable.update(sos)
            sos = new_sos
            print(f"Iteration {iteration}: SOS={len(sos)}, Usable={len(usable)}")
        
        print(f"✗ Max iterations reached")
        return False, self.proof_trace
    
    def prove_unit(self, clauses):
        """Prove using unit resolution strategy"""
        clauses = set(clauses)
        unit_clauses = {c for c in clauses if len(c.literals) == 1}
        
        for iteration in range(self.max_iterations):
            if not unit_clauses:
                print("No more unit clauses. Switching to basic resolution.")
                return self.prove_basic(clauses)
            
            new_clauses = set()
            
            for unit in list(unit_clauses):
                for clause in clauses:
                    if clause == unit:
                        continue
                    
                    resolvents = resolve(unit, clause)
                    for resolvent, mgu in resolvents:
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        
                        new_clauses.add(resolvent)
                        if len(resolvent.literals) == 1:
                            unit_clauses.add(resolvent)
            
            new_clauses = simplify_clause_set(new_clauses)
            
            if not new_clauses:
                print(f"✗ No new clauses. Proof failed.")
                return False, self.proof_trace
            
            clauses.update(new_clauses)
            print(f"Iteration {iteration}: {len(clauses)} clauses, {len(unit_clauses)} units")
        
        print(f"✗ Max iterations reached")
        return False, self.proof_trace
    
    def to_cnf(self, formula):
        """Convert formula to CNF (simplified version)"""
        # This is a placeholder - full implementation would include
        # all CNF conversion steps
        return formula
    
    def extract_clauses(self, cnf):
        """Extract clauses from CNF formula"""
        # Placeholder - would extract individual clauses
        return []

# Example usage
def example_theorem_prover():
    prover = TheoremProver(strategy='sos')
    
    # Define axioms and goal
    x = Variable("x")
    socrates = Constant("socrates")
    
    # Axiom: All humans are mortal
    human_x = Atom(Predicate("Human", [x]))
    mortal_x = Atom(Predicate("Mortal", [x]))
    axiom1 = Forall(x, Or(Not(human_x), mortal_x))
    
    # Axiom: Socrates is human
    axiom2 = Atom(Predicate("Human", [socrates]))
    
    # Goal: Socrates is mortal
    goal = Atom(Predicate("Mortal", [socrates]))
    
    success, trace = prover.prove([axiom1, axiom2], goal)
    print(f"\nProof {'succeeded' if success else 'failed'}")

Practical Applications

Software Verification

class ProgramVerifier:
    """Verify program properties using theorem proving"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='unit')
    
    def verify_precondition(self, precondition, code, postcondition):
        """
        Verify that if precondition holds before code execution,
        postcondition holds after.
        """
        # Convert code to logical formulas (simplified)
        code_axioms = self.code_to_logic(code)
        
        # Prove: precondition ∧ code_axioms → postcondition
        axioms = [precondition] + code_axioms
        success, trace = self.prover.prove(axioms, postcondition)
        
        return success
    
    def code_to_logic(self, code):
        """Convert code to logical axioms (simplified)"""
        # This would analyze code and generate appropriate axioms
        return []

# Example: Verify array bounds
def verify_array_access():
    verifier = ProgramVerifier()
    
    # Precondition: 0 <= i < length(array)
    i = Variable("i")
    array = Variable("array")
    precondition = And(
        Atom(Predicate("GreaterEqual", [i, Constant("0")])),
        Atom(Predicate("Less", [i, Function("length", [array])]))
    )
    
    # Code: value = array[i]
    code = "value = array[i]"
    
    # Postcondition: No array bounds error
    postcondition = Not(Atom(Predicate("ArrayBoundsError", [])))
    
    result = verifier.verify_precondition(precondition, code, postcondition)
    print(f"Verification result: {result}")

Database Query Optimization

class QueryOptimizer:
    """Optimize database queries using theorem proving"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
    
    def queries_equivalent(self, query1, query2):
        """Check if two queries are logically equivalent"""
        # Convert queries to FOL
        fol1 = self.query_to_fol(query1)
        fol2 = self.query_to_fol(query2)
        
        # Prove query1 → query2 and query2 → query1
        forward, _ = self.prover.prove([fol1], fol2)
        backward, _ = self.prover.prove([fol2], fol1)
        
        return forward and backward
    
    def query_to_fol(self, query):
        """Convert SQL-like query to FOL (simplified)"""
        # This would parse SQL and generate FOL representation
        return None
    
    def optimize_query(self, query):
        """Find equivalent but more efficient query"""
        optimizations = [
            self.push_down_selection,
            self.eliminate_redundant_joins,
            self.merge_projections
        ]
        
        optimized = query
        for optimization in optimizations:
            candidate = optimization(optimized)
            if self.queries_equivalent(optimized, candidate):
                optimized = candidate
        
        return optimized
    
    def push_down_selection(self, query):
        """Push selection operations closer to data source"""
        # Implementation would rewrite query tree
        return query
    
    def eliminate_redundant_joins(self, query):
        """Remove unnecessary join operations"""
        return query
    
    def merge_projections(self, query):
        """Combine multiple projection operations"""
        return query

Automated Planning

class Planner:
    """Automated planning using theorem proving"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
    
    def find_plan(self, initial_state, goal_state, actions):
        """
        Find sequence of actions to reach goal from initial state.
        Uses situation calculus representation.
        """
        # Encode initial state
        axioms = [self.encode_state(initial_state)]
        
        # Encode action preconditions and effects
        for action in actions:
            axioms.extend(self.encode_action(action))
        
        # Encode frame axioms (what doesn't change)
        axioms.extend(self.frame_axioms(actions))
        
        # Try to prove goal is reachable
        success, trace = self.prover.prove(axioms, goal_state)
        
        if success:
            return self.extract_plan(trace)
        return None
    
    def encode_state(self, state):
        """Encode state as FOL formula"""
        # Example: At(robot, location_a) ∧ Holding(robot, nothing)
        return None
    
    def encode_action(self, action):
        """Encode action preconditions and effects"""
        # Example: Move(x, from, to) requires At(x, from) and causes At(x, to)
        return []
    
    def frame_axioms(self, actions):
        """Encode what doesn't change when actions execute"""
        return []
    
    def extract_plan(self, proof_trace):
        """Extract action sequence from proof"""
        return []

# Example: Blocks world planning
def blocks_world_example():
    planner = Planner()
    
    # Initial: Block A on table, Block B on A
    initial = And(
        Atom(Predicate("On", [Constant("A"), Constant("table")])),
        Atom(Predicate("On", [Constant("B"), Constant("A")])),
        Atom(Predicate("Clear", [Constant("B")]))
    )
    
    # Goal: Block B on table, Block A on B
    goal = And(
        Atom(Predicate("On", [Constant("B"), Constant("table")])),
        Atom(Predicate("On", [Constant("A"), Constant("B")])),
        Atom(Predicate("Clear", [Constant("A")]))
    )
    
    # Actions: pickup, putdown, stack, unstack
    actions = ["pickup", "putdown", "stack", "unstack"]
    
    plan = planner.find_plan(initial, goal, actions)
    if plan:
        print(f"Plan found: {plan}")
    else:
        print("No plan exists")

Performance Optimization Techniques

Indexing for Fast Clause Retrieval

class ClauseIndex:
    """Index clauses for efficient retrieval during resolution"""
    
    def __init__(self):
        self.positive_index = {}  # predicate_name -> set of clauses
        self.negative_index = {}  # predicate_name -> set of clauses
    
    def add_clause(self, clause):
        """Add clause to index"""
        for literal in clause.literals:
            pred_name = literal.predicate.name
            
            if literal.positive:
                if pred_name not in self.positive_index:
                    self.positive_index[pred_name] = set()
                self.positive_index[pred_name].add(clause)
            else:
                if pred_name not in self.negative_index:
                    self.negative_index[pred_name] = set()
                self.negative_index[pred_name].add(clause)
    
    def get_candidates(self, clause):
        """Get clauses that might resolve with given clause"""
        candidates = set()
        
        for literal in clause.literals:
            pred_name = literal.predicate.name
            
            # Get clauses with complementary literals
            if literal.positive:
                if pred_name in self.negative_index:
                    candidates.update(self.negative_index[pred_name])
            else:
                if pred_name in self.positive_index:
                    candidates.update(self.positive_index[pred_name])
        
        return candidates

class OptimizedProver(TheoremProver):
    """Theorem prover with indexing optimization"""
    
    def __init__(self, max_iterations=1000, strategy='sos'):
        super().__init__(max_iterations, strategy)
        self.index = ClauseIndex()
    
    def prove_basic(self, clauses):
        """Optimized resolution with indexing"""
        clauses = set(clauses)
        
        # Build initial index
        for clause in clauses:
            self.index.add_clause(clause)
        
        for iteration in range(self.max_iterations):
            new_clauses = set()
            
            for clause in clauses:
                # Only check candidates that might resolve
                candidates = self.index.get_candidates(clause)
                
                for candidate in candidates:
                    resolvents = resolve(clause, candidate)
                    
                    for resolvent, mgu in resolvents:
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        
                        new_clauses.add(resolvent)
            
            new_clauses = simplify_clause_set(new_clauses)
            
            if new_clauses.issubset(clauses):
                print(f"✗ No new clauses. Proof failed.")
                return False, self.proof_trace
            
            # Update index with new clauses
            for clause in new_clauses:
                self.index.add_clause(clause)
            
            clauses.update(new_clauses)
            print(f"Iteration {iteration}: {len(clauses)} clauses")
        
        print(f"✗ Max iterations reached")
        return False, self.proof_trace

Good Pattern: Clause Weight Heuristics

class WeightedProver(TheoremProver):
    """Prover that prioritizes lighter clauses"""
    
    def clause_weight(self, clause):
        """Calculate clause weight (prefer smaller clauses)"""
        weight = 0
        for literal in clause.literals:
            weight += 1  # Base weight per literal
            weight += self.term_weight(literal.predicate)
        return weight
    
    def term_weight(self, term):
        """Calculate term complexity"""
        if isinstance(term, Variable):
            return 1
        elif isinstance(term, Constant):
            return 1
        elif isinstance(term, Function):
            return 1 + sum(self.term_weight(arg) for arg in term.args)
        elif isinstance(term, Predicate):
            return sum(self.term_weight(arg) for arg in term.args)
        return 0
    
    def prove_basic(self, clauses):
        """Resolution with weight-based clause selection"""
        clauses = set(clauses)
        
        for iteration in range(self.max_iterations):
            # Sort clauses by weight
            sorted_clauses = sorted(clauses, key=self.clause_weight)
            new_clauses = set()
            
            # Prioritize lighter clauses
            for i, clause1 in enumerate(sorted_clauses[:100]):  # Limit to top 100
                for clause2 in sorted_clauses[i+1:200]:
                    resolvents = resolve(clause1, clause2)
                    
                    for resolvent, mgu in resolvents:
                        if not resolvent.literals:
                            print(f"✓ Proof found at iteration {iteration}")
                            return True, self.proof_trace
                        
                        new_clauses.add(resolvent)
            
            new_clauses = simplify_clause_set(new_clauses)
            
            if new_clauses.issubset(clauses):
                print(f"✗ No new clauses. Proof failed.")
                return False, self.proof_trace
            
            clauses.update(new_clauses)
            print(f"Iteration {iteration}: {len(clauses)} clauses")
        
        print(f"✗ Max iterations reached")
        return False, self.proof_trace

Bad Pattern: No Heuristics or Ordering

def prove_unoptimized(clauses, max_iterations=1000):
    """BAD: No heuristics leads to poor performance"""
    clauses = list(clauses)  # WRONG: Using list instead of set
    
    for iteration in range(max_iterations):
        # WRONG: Trying all pairs without any ordering
        for i in range(len(clauses)):
            for j in range(len(clauses)):
                if i == j:
                    continue
                
                resolvents = resolve(clauses[i], clauses[j])
                
                for resolvent, mgu in resolvents:
                    if not resolvent.literals:
                        return True
                    
                    # WRONG: Appending duplicates
                    clauses.append(resolvent)
        
        # WRONG: Exponential growth, no pruning
        print(f"Iteration {iteration}: {len(clauses)} clauses")
    
    return False

Integration with Modern Tools

Using Z3 SMT Solver

try:
    from z3 import *
    
    class Z3TheoremProver:
        """Wrapper around Z3 SMT solver for theorem proving"""
        
        def __init__(self):
            self.solver = Solver()
        
        def prove(self, axioms, goal):
            """Prove goal from axioms using Z3"""
            # Add axioms
            for axiom in axioms:
                z3_axiom = self.to_z3(axiom)
                self.solver.add(z3_axiom)
            
            # Add negation of goal
            z3_goal = self.to_z3(goal)
            self.solver.add(Not(z3_goal))
            
            # Check satisfiability
            result = self.solver.check()
            
            if result == unsat:
                print("✓ Proof successful (unsatisfiable)")
                return True
            elif result == sat:
                print("✗ Proof failed (satisfiable)")
                print(f"Counterexample: {self.solver.model()}")
                return False
            else:
                print("? Unknown result")
                return None
        
        def to_z3(self, formula):
            """Convert FOL formula to Z3 expression"""
            # Simplified conversion
            if isinstance(formula, Atom):
                return Bool(str(formula.predicate))
            elif isinstance(formula, Not):
                return Not(self.to_z3(formula.formula))
            elif isinstance(formula, And):
                return And(self.to_z3(formula.left), self.to_z3(formula.right))
            elif isinstance(formula, Or):
                return Or(self.to_z3(formula.left), self.to_z3(formula.right))
            return Bool('unknown')
    
    # Example usage
    def example_z3():
        prover = Z3TheoremProver()
        
        # Define problem
        x = Variable("x")
        p = Atom(Predicate("P", [x]))
        q = Atom(Predicate("Q", [x]))
        
        # Axiom: P(x) → Q(x)
        axiom = Or(Not(p), q)
        
        # Goal: P(a) ∧ (P(x) → Q(x)) ⊢ Q(a)
        result = prover.prove([axiom], q)
        print(f"Z3 result: {result}")

except ImportError:
    print("Z3 not available. Install with: pip install z3-solver")

Integration with Prover9/Mace4

import subprocess
import tempfile
import os

class Prover9Interface:
    """Interface to Prover9 automated theorem prover"""
    
    def __init__(self, prover9_path='prover9', timeout=60):
        self.prover9_path = prover9_path
        self.timeout = timeout
    
    def prove(self, axioms, goal):
        """Prove goal from axioms using Prover9"""
        # Create input file
        input_text = self.format_input(axioms, goal)
        
        # Write to temporary file
        with tempfile.NamedTemporaryFile(mode='w', suffix='.in', delete=False) as f:
            f.write(input_text)
            input_file = f.name
        
        try:
            # Run Prover9
            result = subprocess.run(
                [self.prover9_path, '-f', input_file],
                capture_output=True,
                text=True,
                timeout=self.timeout
            )
            
            # Parse output
            output = result.stdout
            if 'THEOREM PROVED' in output:
                print("✓ Prover9: Theorem proved")
                return True, self.extract_proof(output)
            elif 'SEARCH FAILED' in output:
                print("✗ Prover9: Search failed")
                return False, None
            else:
                print("? Prover9: Unknown result")
                return None, None
        
        except subprocess.TimeoutExpired:
            print(f"✗ Prover9: Timeout after {self.timeout}s")
            return None, None
        
        finally:
            os.unlink(input_file)
    
    def format_input(self, axioms, goal):
        """Format problem in Prover9 syntax"""
        lines = ["formulas(assumptions)."]
        
        for axiom in axioms:
            lines.append(f"  {self.to_prover9(axiom)}.")
        
        lines.append("end_of_list.")
        lines.append("")
        lines.append("formulas(goals).")
        lines.append(f"  {self.to_prover9(goal)}.")
        lines.append("end_of_list.")
        
        return "\n".join(lines)
    
    def to_prover9(self, formula):
        """Convert FOL formula to Prover9 syntax"""
        if isinstance(formula, Atom):
            pred = formula.predicate
            if not pred.args:
                return pred.name
            args = ", ".join(self.term_to_prover9(arg) for arg in pred.args)
            return f"{pred.name}({args})"
        
        elif isinstance(formula, Not):
            return f"-({self.to_prover9(formula.formula)})"
        
        elif isinstance(formula, And):
            left = self.to_prover9(formula.left)
            right = self.to_prover9(formula.right)
            return f"({left} & {right})"
        
        elif isinstance(formula, Or):
            left = self.to_prover9(formula.left)
            right = self.to_prover9(formula.right)
            return f"({left} | {right})"
        
        elif isinstance(formula, Forall):
            var = formula.variable.name
            body = self.to_prover9(formula.formula)
            return f"all {var} ({body})"
        
        elif isinstance(formula, Exists):
            var = formula.variable.name
            body = self.to_prover9(formula.formula)
            return f"exists {var} ({body})"
        
        return "unknown"
    
    def term_to_prover9(self, term):
        """Convert term to Prover9 syntax"""
        if isinstance(term, Variable):
            return term.name
        elif isinstance(term, Constant):
            return term.name
        elif isinstance(term, Function):
            if not term.args:
                return term.name
            args = ", ".join(self.term_to_prover9(arg) for arg in term.args)
            return f"{term.name}({args})"
        return "unknown"
    
    def extract_proof(self, output):
        """Extract proof steps from Prover9 output"""
        proof_lines = []
        in_proof = False
        
        for line in output.split('\n'):
            if '====== PROOF ======' in line:
                in_proof = True
            elif '====== end of proof ======' in line:
                break
            elif in_proof:
                proof_lines.append(line.strip())
        
        return proof_lines

# Example usage
def example_prover9():
    prover = Prover9Interface()
    
    # Socrates example
    x = Variable("x")
    socrates = Constant("socrates")
    
    human_x = Atom(Predicate("Human", [x]))
    mortal_x = Atom(Predicate("Mortal", [x]))
    human_socrates = Atom(Predicate("Human", [socrates]))
    mortal_socrates = Atom(Predicate("Mortal", [socrates]))
    
    axiom1 = Forall(x, Or(Not(human_x), mortal_x))
    axiom2 = human_socrates
    goal = mortal_socrates
    
    success, proof = prover.prove([axiom1, axiom2], goal)
    
    if success and proof:
        print("\nProof steps:")
        for step in proof:
            print(f"  {step}")

Best Practices and Common Pitfalls

Good Pattern: Proper Variable Scoping

class VariableManager:
    """Manage variable scoping and renaming"""
    
    def __init__(self):
        self.counter = 0
        self.scope_stack = [{}]
    
    def fresh_variable(self, base_name="x"):
        """Generate fresh variable name"""
        var_name = f"{base_name}_{self.counter}"
        self.counter += 1
        return Variable(var_name)
    
    def enter_scope(self):
        """Enter new variable scope"""
        self.scope_stack.append({})
    
    def exit_scope(self):
        """Exit current variable scope"""
        if len(self.scope_stack) > 1:
            self.scope_stack.pop()
    
    def bind_variable(self, var_name, value):
        """Bind variable in current scope"""
        self.scope_stack[-1][var_name] = value
    
    def lookup_variable(self, var_name):
        """Look up variable in scope stack"""
        for scope in reversed(self.scope_stack):
            if var_name in scope:
                return scope[var_name]
        return None

# Good usage
def good_variable_handling():
    vm = VariableManager()
    
    # Create fresh variables for each quantifier
    vm.enter_scope()
    x1 = vm.fresh_variable("x")
    formula1 = Forall(x1, Atom(Predicate("P", [x1])))
    vm.exit_scope()
    
    vm.enter_scope()
    x2 = vm.fresh_variable("x")
    formula2 = Forall(x2, Atom(Predicate("Q", [x2])))
    vm.exit_scope()
    
    # x1 and x2 are distinct, no conflicts
    print(f"Formula 1: {formula1}")
    print(f"Formula 2: {formula2}")

Bad Pattern: Variable Name Conflicts

def bad_variable_handling():
    """BAD: Reusing variable names causes conflicts"""
    x = Variable("x")
    
    # WRONG: Same variable used in different contexts
    formula1 = Forall(x, Atom(Predicate("P", [x])))
    formula2 = Forall(x, Atom(Predicate("Q", [x])))
    
    # WRONG: Combining formulas with same variable
    combined = And(formula1, formula2)
    
    # This creates unintended variable capture
    print(f"Combined: {combined}")

Good Pattern: Incremental Proof Construction

class IncrementalProver:
    """Build proofs incrementally with checkpointing"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
        self.axiom_stack = []
        self.checkpoints = []
    
    def add_axiom(self, axiom):
        """Add axiom to knowledge base"""
        self.axiom_stack.append(axiom)
    
    def checkpoint(self):
        """Save current state"""
        self.checkpoints.append(len(self.axiom_stack))
    
    def rollback(self):
        """Rollback to last checkpoint"""
        if self.checkpoints:
            checkpoint = self.checkpoints.pop()
            self.axiom_stack = self.axiom_stack[:checkpoint]
    
    def prove_incremental(self, goal):
        """Prove goal with current axioms"""
        return self.prover.prove(self.axiom_stack, goal)
    
    def prove_with_assumption(self, assumption, goal):
        """Prove goal assuming additional axiom"""
        self.checkpoint()
        self.add_axiom(assumption)
        
        result = self.prove_incremental(goal)
        
        self.rollback()
        return result

# Example usage
def example_incremental():
    prover = IncrementalProver()
    
    # Add base axioms
    x = Variable("x")
    prover.add_axiom(Forall(x, Atom(Predicate("Human", [x]))))
    
    # Prove with additional assumption
    socrates = Constant("socrates")
    assumption = Atom(Predicate("Human", [socrates]))
    goal = Atom(Predicate("Mortal", [socrates]))
    
    result = prover.prove_with_assumption(assumption, goal)
    print(f"Proof with assumption: {result}")

Bad Pattern: Monolithic Proof Attempts

def bad_proof_attempt(all_axioms, goal):
    """BAD: No incremental building or checkpointing"""
    prover = TheoremProver()
    
    # WRONG: Trying to prove everything at once
    # No way to identify which axioms are needed
    # No way to debug failed proofs
    success, trace = prover.prove(all_axioms, goal)
    
    if not success:
        # WRONG: No information about why proof failed
        print("Proof failed, no idea why")
    
    return success

Debugging and Visualization

Proof Tree Visualization

class ProofVisualizer:
    """Visualize proof trees and resolution steps"""
    
    def __init__(self):
        self.nodes = []
        self.edges = []
    
    def visualize_proof(self, proof_trace):
        """Create visual representation of proof"""
        for i, step in enumerate(proof_trace):
            parent1 = step['parent1']
            parent2 = step['parent2']
            resolvent = step['resolvent']
            mgu = step['mgu']
            
            print(f"\nStep {i + 1}:")
            print(f"  Parent 1: {parent1}")
            print(f"  Parent 2: {parent2}")
            print(f"  MGU: {mgu}")
            print(f"  Resolvent: {resolvent}")
            
            if not resolvent.literals:
                print(f"  ✓ Empty clause (contradiction)")
    
    def export_dot(self, proof_trace, filename='proof.dot'):
        """Export proof as Graphviz DOT file"""
        lines = ['digraph Proof {']
        lines.append('  rankdir=TB;')
        lines.append('  node [shape=box];')
        
        for i, step in enumerate(proof_trace):
            parent1_id = f"p1_{i}"
            parent2_id = f"p2_{i}"
            resolvent_id = f"r_{i}"
            
            lines.append(f'  {parent1_id} [label="{step["parent1"]}"];')
            lines.append(f'  {parent2_id} [label="{step["parent2"]}"];')
            lines.append(f'  {resolvent_id} [label="{step["resolvent"]}"];')
            
            lines.append(f'  {parent1_id} -> {resolvent_id};')
            lines.append(f'  {parent2_id} -> {resolvent_id};')
        
        lines.append('}')
        
        with open(filename, 'w') as f:
            f.write('\n'.join(lines))
        
        print(f"Proof exported to {filename}")
        print(f"Visualize with: dot -Tpng {filename} -o proof.png")

# Example usage
def example_visualization():
    prover = TheoremProver(strategy='sos')
    visualizer = ProofVisualizer()
    
    # Run proof
    x = Variable("x")
    socrates = Constant("socrates")
    
    axiom1 = Forall(x, Or(Not(Atom(Predicate("Human", [x]))),
                           Atom(Predicate("Mortal", [x]))))
    axiom2 = Atom(Predicate("Human", [socrates]))
    goal = Atom(Predicate("Mortal", [socrates]))
    
    success, trace = prover.prove([axiom1, axiom2], goal)
    
    if success:
        visualizer.visualize_proof(trace)
        visualizer.export_dot(trace)

Performance Benchmarks

Comparing Different Strategies

import time

class ProverBenchmark:
    """Benchmark different theorem proving strategies"""
    
    def __init__(self):
        self.results = []
    
    def benchmark_strategy(self, strategy, axioms, goal, name):
        """Benchmark a single strategy"""
        prover = TheoremProver(max_iterations=1000, strategy=strategy)
        
        start_time = time.time()
        success, trace = prover.prove(axioms, goal)
        end_time = time.time()
        
        elapsed = end_time - start_time
        
        result = {
            'name': name,
            'strategy': strategy,
            'success': success,
            'time': elapsed,
            'steps': len(trace) if trace else 0
        }
        
        self.results.append(result)
        return result
    
    def run_benchmarks(self, axioms, goal):
        """Run all strategy benchmarks"""
        strategies = [
            ('basic', 'Basic Resolution'),
            ('sos', 'Set of Support'),
            ('unit', 'Unit Resolution')
        ]
        
        print("Running benchmarks...")
        print("=" * 60)
        
        for strategy, name in strategies:
            result = self.benchmark_strategy(strategy, axioms, goal, name)
            print(f"\n{name}:")
            print(f"  Success: {result['success']}")
            print(f"  Time: {result['time']:.4f}s")
            print(f"  Steps: {result['steps']}")
        
        print("\n" + "=" * 60)
        self.print_summary()
    
    def print_summary(self):
        """Print benchmark summary"""
        print("\nSummary:")
        print(f"{'Strategy':<20} {'Success':<10} {'Time (s)':<12} {'Steps':<10}")
        print("-" * 60)
        
        for result in sorted(self.results, key=lambda r: r['time']):
            print(f"{result['name']:<20} {str(result['success']):<10} "
                  f"{result['time']:<12.4f} {result['steps']:<10}")

# Example usage
def example_benchmark():
    benchmark = ProverBenchmark()
    
    # Define problem
    x = Variable("x")
    y = Variable("y")
    
    # Transitivity of equality
    axiom1 = Forall(x, Atom(Predicate("Eq", [x, x])))
    axiom2 = Forall(x, Forall(y, 
        Or(Not(Atom(Predicate("Eq", [x, y]))),
           Atom(Predicate("Eq", [y, x])))))
    
    a = Constant("a")
    b = Constant("b")
    goal = Atom(Predicate("Eq", [a, b]))
    
    benchmark.run_benchmarks([axiom1, axiom2], goal)

Real-World Case Studies

Case Study 1: Verifying Cryptographic Protocols

class CryptoProtocolVerifier:
    """Verify security properties of cryptographic protocols"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
    
    def verify_authentication(self, protocol):
        """Verify authentication property of protocol"""
        # Model protocol as FOL axioms
        axioms = self.model_protocol(protocol)
        
        # Authentication property: if B receives message from A,
        # then A actually sent it
        a = Constant("alice")
        b = Constant("bob")
        msg = Variable("msg")
        
        # Goal: Received(b, msg, a) → Sent(a, msg, b)
        received = Atom(Predicate("Received", [b, msg, a]))
        sent = Atom(Predicate("Sent", [a, msg, b]))
        goal = Or(Not(received), sent)
        
        success, trace = self.prover.prove(axioms, goal)
        
        if success:
            print("✓ Authentication property verified")
        else:
            print("✗ Authentication property violated")
            print("Potential attack found")
        
        return success
    
    def model_protocol(self, protocol):
        """Convert protocol to FOL axioms"""
        axioms = []
        
        # Model encryption/decryption
        # Encrypt(k, m) can be decrypted with k
        k = Variable("k")
        m = Variable("m")
        encrypted = Function("Encrypt", [k, m])
        axioms.append(
            Forall(k, Forall(m,
                Atom(Predicate("CanDecrypt", [k, encrypted]))))
        )
        
        # Model message transmission
        # If A sends encrypted message, B can receive if has key
        # ... (additional protocol-specific axioms)
        
        return axioms
    
    def verify_secrecy(self, protocol, secret):
        """Verify that secret remains confidential"""
        axioms = self.model_protocol(protocol)
        
        # Attacker should not learn secret
        attacker = Constant("attacker")
        goal = Not(Atom(Predicate("Knows", [attacker, secret])))
        
        success, trace = self.prover.prove(axioms, goal)
        
        if success:
            print("✓ Secrecy property verified")
        else:
            print("✗ Secrecy property violated")
            print("Information leak detected")
        
        return success

# Example: Needham-Schroeder protocol
def example_crypto_verification():
    verifier = CryptoProtocolVerifier()
    
    protocol = {
        'name': 'Needham-Schroeder',
        'steps': [
            'A -> B: {Na, A}Kb',
            'B -> A: {Na, Nb}Ka',
            'A -> B: {Nb}Kb'
        ]
    }
    
    result = verifier.verify_authentication(protocol)
    print(f"Verification result: {result}")

Case Study 2: Hardware Verification

class HardwareVerifier:
    """Verify hardware circuit correctness"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='unit')
    
    def verify_circuit(self, circuit_spec, property_spec):
        """Verify that circuit satisfies property"""
        # Convert circuit to FOL
        circuit_axioms = self.circuit_to_fol(circuit_spec)
        
        # Convert property to FOL
        property_formula = self.property_to_fol(property_spec)
        
        # Prove property holds
        success, trace = self.prover.prove(circuit_axioms, property_formula)
        
        return success
    
    def circuit_to_fol(self, circuit):
        """Convert circuit specification to FOL axioms"""
        axioms = []
        
        # Model gates
        for gate in circuit.get('gates', []):
            gate_axiom = self.model_gate(gate)
            axioms.append(gate_axiom)
        
        # Model connections
        for connection in circuit.get('connections', []):
            conn_axiom = self.model_connection(connection)
            axioms.append(conn_axiom)
        
        return axioms
    
    def model_gate(self, gate):
        """Model logic gate as FOL formula"""
        gate_type = gate['type']
        inputs = [Variable(inp) for inp in gate['inputs']]
        output = Variable(gate['output'])
        
        if gate_type == 'AND':
            # output = input1 AND input2
            formula = Atom(Predicate("Equals", [
                output,
                Function("And", inputs)
            ]))
        elif gate_type == 'OR':
            formula = Atom(Predicate("Equals", [
                output,
                Function("Or", inputs)
            ]))
        elif gate_type == 'NOT':
            formula = Atom(Predicate("Equals", [
                output,
                Function("Not", inputs)
            ]))
        else:
            formula = None
        
        return formula
    
    def model_connection(self, connection):
        """Model wire connection"""
        source = Variable(connection['from'])
        dest = Variable(connection['to'])
        
        return Atom(Predicate("Equals", [source, dest]))
    
    def property_to_fol(self, property_spec):
        """Convert property specification to FOL"""
        # Example: "output is always 1 when both inputs are 1"
        return None

# Example: Verify full adder
def example_hardware_verification():
    verifier = HardwareVerifier()
    
    full_adder = {
        'gates': [
            {'type': 'XOR', 'inputs': ['a', 'b'], 'output': 'sum1'},
            {'type': 'XOR', 'inputs': ['sum1', 'cin'], 'output': 'sum'},
            {'type': 'AND', 'inputs': ['a', 'b'], 'output': 'carry1'},
            {'type': 'AND', 'inputs': ['sum1', 'cin'], 'output': 'carry2'},
            {'type': 'OR', 'inputs': ['carry1', 'carry2'], 'output': 'cout'}
        ],
        'connections': []
    }
    
    property_spec = {
        'type': 'correctness',
        'description': 'sum and carry are correct'
    }
    
    result = verifier.verify_circuit(full_adder, property_spec)
    print(f"Full adder verification: {result}")

Case Study 3: Semantic Web Reasoning

class SemanticWebReasoner:
    """Reason over RDF/OWL ontologies using FOL"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
        self.ontology = []
    
    def load_ontology(self, ontology_axioms):
        """Load ontology axioms"""
        self.ontology = ontology_axioms
    
    def query(self, query_formula):
        """Query ontology"""
        success, trace = self.prover.prove(self.ontology, query_formula)
        return success
    
    def infer_subclass(self, class_a, class_b):
        """Infer if class_a is subclass of class_b"""
        x = Variable("x")
        
        # Query: ∀x. class_a(x) → class_b(x)
        instance_a = Atom(Predicate(class_a, [x]))
        instance_b = Atom(Predicate(class_b, [x]))
        query = Forall(x, Or(Not(instance_a), instance_b))
        
        return self.query(query)
    
    def infer_property(self, subject, property_name, object_value):
        """Infer if subject has property with object"""
        query = Atom(Predicate(property_name, [
            Constant(subject),
            Constant(object_value)
        ]))
        
        return self.query(query)

# Example: Animal taxonomy reasoning
def example_semantic_web():
    reasoner = SemanticWebReasoner()
    
    # Define ontology
    x = Variable("x")
    
    # Axiom: All dogs are mammals
    dog = Atom(Predicate("Dog", [x]))
    mammal = Atom(Predicate("Mammal", [x]))
    axiom1 = Forall(x, Or(Not(dog), mammal))
    
    # Axiom: All mammals are animals
    animal = Atom(Predicate("Animal", [x]))
    axiom2 = Forall(x, Or(Not(mammal), animal))
    
    # Axiom: Fido is a dog
    fido = Constant("fido")
    axiom3 = Atom(Predicate("Dog", [fido]))
    
    reasoner.load_ontology([axiom1, axiom2, axiom3])
    
    # Query: Is Fido an animal?
    result = reasoner.infer_property("fido", "Animal", "true")
    print(f"Is Fido an animal? {result}")
    
    # Query: Are dogs animals?
    result = reasoner.infer_subclass("Dog", "Animal")
    print(f"Are dogs animals? {result}")

Advanced Topics

class ModalTheoremProver:
    """Theorem prover for modal logic (knowledge, belief, time)"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
        self.worlds = set()
        self.accessibility = {}
    
    def add_world(self, world):
        """Add possible world"""
        self.worlds.add(world)
    
    def add_accessibility(self, world1, world2):
        """Add accessibility relation between worlds"""
        if world1 not in self.accessibility:
            self.accessibility[world1] = set()
        self.accessibility[world1].add(world2)
    
    def knows(self, agent, formula):
        """Agent knows formula (true in all accessible worlds)"""
        # K(agent, φ) ≡ ∀w. Accessible(current, w) → Holds(φ, w)
        w = Variable("w")
        current = Constant("current_world")
        
        accessible = Atom(Predicate("Accessible", [current, w]))
        holds = Atom(Predicate("Holds", [formula, w]))
        
        return Forall(w, Or(Not(accessible), holds))
    
    def believes(self, agent, formula):
        """Agent believes formula (similar to knows but weaker)"""
        # Similar to knows but with belief-accessibility relation
        return self.knows(agent, formula)
    
    def eventually(self, formula):
        """Formula eventually becomes true (temporal logic)"""
        # F(φ) ≡ ∃t. Future(current, t) ∧ Holds(φ, t)
        t = Variable("t")
        current = Constant("current_time")
        
        future = Atom(Predicate("Future", [current, t]))
        holds = Atom(Predicate("Holds", [formula, t]))
        
        return Exists(t, And(future, holds))
    
    def always(self, formula):
        """Formula always holds (temporal logic)"""
        # G(φ) ≡ ∀t. Future(current, t) → Holds(φ, t)
        t = Variable("t")
        current = Constant("current_time")
        
        future = Atom(Predicate("Future", [current, t]))
        holds = Atom(Predicate("Holds", [formula, t]))
        
        return Forall(t, Or(Not(future), holds))

# Example: Knowledge reasoning
def example_modal_logic():
    prover = ModalTheoremProver()
    
    # Setup worlds
    prover.add_world("w1")
    prover.add_world("w2")
    prover.add_accessibility("w1", "w2")
    
    # Alice knows that Bob is tall
    alice = Constant("alice")
    bob = Constant("bob")
    tall = Atom(Predicate("Tall", [bob]))
    
    knowledge = prover.knows(alice, tall)
    print(f"Alice knows Bob is tall: {knowledge}")

Higher-Order Logic

class HigherOrderProver:
    """Prover for higher-order logic (predicates as arguments)"""
    
    def __init__(self):
        self.prover = TheoremProver(strategy='sos')
    
    def apply_predicate(self, pred_var, arg):
        """Apply predicate variable to argument"""
        # P(x) where P is a variable
        return Atom(Predicate("Apply", [pred_var, arg]))
    
    def forall_predicates(self, pred_var, formula):
        """Quantify over predicates"""
        # ∀P. formula(P)
        return Forall(pred_var, formula)
    
    def exists_predicate(self, pred_var, formula):
        """Existential quantification over predicates"""
        # ∃P. formula(P)
        return Exists(pred_var, formula)

# Example: Prove properties about properties
def example_higher_order():
    prover = HigherOrderProver()
    
    # ∀P. (∀x. P(x)) → P(a)
    # "If property holds for all, it holds for specific instance"
    p = Variable("P")  # Predicate variable
    x = Variable("x")
    a = Constant("a")
    
    p_of_x = prover.apply_predicate(p, x)
    p_of_a = prover.apply_predicate(p, a)
    
    formula = prover.forall_predicates(p,
        Or(Not(Forall(x, p_of_x)), p_of_a))
    
    print(f"Higher-order formula: {formula}")

Conclusion and Best Practices Summary

First-order theorem proving is a powerful technique with applications across computer science, from software verification to artificial intelligence. Here are the key takeaways:

Always convert formulas to CNF before applying resolution
Use unification with occurs check to avoid infinite structures
Apply simplification techniques (subsumption, tautology elimination)
Choose appropriate resolution strategies (SOS, unit resolution)
Index clauses for efficient retrieval
Use weight heuristics to guide search
Implement proper variable scoping and renaming
Build proofs incrementally with checkpointing
Visualize proofs for debugging
Benchmark different strategies for your problem domain

Good Pattern: Complete Prover Template

class ProductionTheoremProver:
    """Production-ready theorem prover with all optimizations"""
    
    def __init__(self, config=None):
        self.config = config or {
            'max_iterations': 1000,
            'strategy': 'sos',
            'use_indexing': True,
            'use_weights': True,
            'simplify': True,
            'timeout': 60
        }
        
        self.prover = TheoremProver(
            max_iterations=self.config['max_iterations'],
            strategy=self.config['strategy']
        )
        
        if self.config['use_indexing']:
            self.index = ClauseIndex()
        
        self.stats = {
            'clauses_generated': 0,
            'clauses_simplified': 0,
            'resolutions': 0
        }
    
    def prove(self, axioms, goal, verbose=False):
        """Prove goal from axioms with full optimization"""
        start_time = time.time()
        
        # Convert to CNF
        cnf_axioms = [self.to_cnf(ax) for ax in axioms]
        cnf_goal = self.to_cnf(Not(goal))
        
        # Extract and simplify clauses
        clauses = self.extract_all_clauses(cnf_axioms + [cnf_goal])
        
        if self.config['simplify']:
            clauses = simplify_clause_set(clauses)
        
        # Build index
        if self.config['use_indexing']:
            for clause in clauses:
                self.index.add_clause(clause)
        
        # Run prover
        success, trace = self.prover.prove_basic(clauses)
        
        elapsed = time.time() - start_time
        
        if verbose:
            self.print_stats(elapsed)
        
        return success, trace
    
    def print_stats(self, elapsed):
        """Print proving statistics"""
        print(f"\nStatistics:")
        print(f"  Time: {elapsed:.4f}s")
        print(f"  Clauses generated: {self.stats['clauses_generated']}")
        print(f"  Clauses after simplification: {self.stats['clauses_simplified']}")
        print(f"  Resolution steps: {self.stats['resolutions']}")

Bad Pattern: Incomplete Implementation

def bad_prover(axioms, goal):
    """BAD: Missing critical features"""
    # WRONG: No CNF conversion
    # WRONG: No simplification
    # WRONG: No indexing
    # WRONG: No heuristics
    # WRONG: No timeout
    # WRONG: No statistics
    
    clauses = axioms + [Not(goal)]
    
    while True:  # WRONG: Infinite loop possible
        for c1 in clauses:
            for c2 in clauses:
                resolvent = resolve(c1, c2)
                if resolvent:
                    clauses.append(resolvent)  # WRONG: Duplicates
                    if not resolvent.literals:
                        return True
    
    return False

External Resources

“Automated Reasoning” by Wos, Overbeek, Lusk, and Boyle
“Handbook of Automated Reasoning” edited by Robinson and Voronkov
“Logic for Computer Science” by Steve Reeves and Michael Clarke
Prover9/Mace4 documentation: https://www.cs.unm.edu/~mccune/prover9/
Z3 Theorem Prover: https://github.com/Z3Prover/z3
TPTP Problem Library: http://www.tptp.org/
Isabelle/HOL: https://isabelle.in.tum.de/
Coq Proof Assistant: https://coq.inria.fr/

First-order theorem proving continues to be an active area of research with new algorithms, optimizations, and applications emerging regularly. Whether you’re verifying software, reasoning about knowledge, or solving complex logical puzzles, understanding these fundamental techniques provides a solid foundation for automated reasoning.