Skip to main content
⚡ Calmops

Chomsky Hierarchy

Introduction

The Chomsky hierarchy is a fundamental classification system in formal language theory that organizes languages by their computational power. Developed by Noam Chomsky, it provides a framework for understanding the relationship between different types of grammars and automata.

In this article, we’ll explore the four levels of the Chomsky hierarchy and their properties.

Overview of the Chomsky Hierarchy

The Chomsky hierarchy consists of four levels, each more powerful than the previous:

Level 0: Recursively Enumerable Languages (Turing Machines)
  ↑
Level 1: Context-Sensitive Languages (Linear Bounded Automata)
  ↑
Level 2: Context-Free Languages (Pushdown Automata)
  ↑
Level 3: Regular Languages (Finite Automata)

Key Property: Each level is a proper subset of the level above it.

Level 3: Regular Languages

Definition

Regular languages are the simplest class in the hierarchy. They can be recognized by finite automata.

Grammar Form:

A → aB | a | ε
(Right-linear or left-linear)

Examples:

L₁ = {a*}                    (zero or more a's)
L₂ = {(ab)*}                 (zero or more ab's)
L₃ = {0, 1}*                 (all binary strings)

Recognizing Automaton

Finite Automaton (FA):

  • Finite number of states
  • Deterministic or non-deterministic
  • No memory (except current state)

Example:

Language: {a*b*}

States: q0, q1, q2
Start: q0
Accept: q0, q1, q2

Transitions:
q0 --a--> q0
q0 --b--> q1
q1 --b--> q1

Properties

Closure Properties:

  • Closed under union
  • Closed under concatenation
  • Closed under Kleene star
  • Closed under complement
  • Closed under intersection

Decidability:

  • Membership: Decidable (O(n))
  • Emptiness: Decidable
  • Finiteness: Decidable
  • Equivalence: Decidable

Level 2: Context-Free Languages

Definition

Context-free languages can be generated by context-free grammars.

Grammar Form:

A → α
where A is a non-terminal and α is any string of terminals and non-terminals

Examples:

L₁ = {aⁿbⁿ | n ≥ 0}          (equal a's and b's)
L₂ = {ww^R | w ∈ {a,b}*}     (palindromes)
L₃ = {(ⁿ)ⁿ | n ≥ 0}          (balanced parentheses)

Recognizing Automaton

Pushdown Automaton (PDA):

  • Finite number of states
  • Stack for memory
  • Can read and write to stack

Example:

Language: {aⁿbⁿ | n ≥ 0}

States: q0, q1, q2
Start: q0
Accept: q2

Transitions:
q0 --a, ε/A--> q0  (push A)
q0 --b, A/ε --> q1 (pop A)
q1 --b, A/ε --> q1 (pop A)
q1 --ε, ε/ε --> q2 (accept)

Properties

Closure Properties:

  • Closed under union
  • Closed under concatenation
  • Closed under Kleene star
  • NOT closed under intersection
  • NOT closed under complement

Decidability:

  • Membership: Decidable (O(n³))
  • Emptiness: Decidable
  • Finiteness: Decidable
  • Equivalence: Undecidable

Relationship to Regular Languages

Every regular language is context-free.

Proof:

Given regular grammar:
A → aB | a | ε

This is also a context-free grammar.
Therefore, every regular language is context-free.

Level 1: Context-Sensitive Languages

Definition

Context-sensitive languages can be generated by context-sensitive grammars.

Grammar Form:

αAβ → αγβ
where |γ| ≥ 1 (non-contracting)

Examples:

L₁ = {aⁿbⁿcⁿ | n ≥ 0}        (equal a's, b's, and c's)
L₂ = {ww | w ∈ {a,b}*}       (string concatenated with itself)

Recognizing Automaton

Linear Bounded Automaton (LBA):

  • Finite number of states
  • Tape bounded by input length
  • Can read and write to tape

Properties:

  • More powerful than PDA
  • Less powerful than Turing machine
  • Tape size is O(n) where n is input length

Properties

Closure Properties:

  • Closed under union
  • Closed under concatenation
  • Closed under Kleene star
  • Closed under complement
  • Closed under intersection

Decidability:

  • Membership: Decidable (O(n^k) for some k)
  • Emptiness: Undecidable
  • Finiteness: Undecidable
  • Equivalence: Undecidable

Relationship to Context-Free Languages

Not every context-sensitive language is context-free.

Counterexample:

L = {aⁿbⁿcⁿ | n ≥ 0}

This is context-sensitive but NOT context-free.
Proof: Context-free languages cannot count three independent sequences.

Level 0: Recursively Enumerable Languages

Definition

Recursively enumerable languages can be recognized by Turing machines.

Grammar Form:

Any production rule (including contracting rules)

Examples:

L₁ = All computable languages
L₂ = Halting problem (semi-decidable)

Recognizing Automaton

Turing Machine (TM):

  • Finite number of states
  • Infinite tape
  • Can read and write to tape
  • Can move left and right

Properties:

  • Most powerful automaton
  • Can compute any computable function
  • May not halt on invalid input

Properties

Closure Properties:

  • Closed under union
  • Closed under concatenation
  • Closed under Kleene star
  • NOT closed under complement

Decidability:

  • Membership: Semi-decidable (may not halt)
  • Emptiness: Undecidable
  • Finiteness: Undecidable
  • Equivalence: Undecidable

Relationship to Context-Sensitive Languages

Not every recursively enumerable language is context-sensitive.

Counterexample:

Halting problem: {M, w | Turing machine M halts on input w}

This is recursively enumerable but NOT context-sensitive.

Hierarchy Summary

Language Classes

Level Grammar Automaton Closure Decidable
3 Regular FA All Yes
2 CFG PDA Union, Concat, * Membership
1 CSG LBA All Membership
0 Unrestricted TM Union, Concat, * Semi-decidable

Proper Subset Relationships

Regular ⊂ Context-Free ⊂ Context-Sensitive ⊂ Recursively Enumerable

Each level is a proper subset of the next (there exist languages in each level not in the previous).

Examples of Languages at Each Level

Regular Languages Only

L = {a*b*}
L = {(ab)*}
L = {0, 1}*

Context-Free but Not Regular

L = {aⁿbⁿ | n ≥ 0}
L = {ww^R | w ∈ {a,b}*}
L = {(ⁿ)ⁿ | n ≥ 0}

Context-Sensitive but Not Context-Free

L = {aⁿbⁿcⁿ | n ≥ 0}
L = {ww | w ∈ {a,b}*}

Recursively Enumerable but Not Context-Sensitive

L = {M, w | Turing machine M halts on input w}
L = {G, w | Grammar G generates string w}

Practical Implications

Choosing the Right Formalism

Use Regular Expressions/FA for:

  • Simple pattern matching
  • Lexical analysis
  • Input validation

Use Context-Free Grammars for:

  • Programming language syntax
  • Parsing
  • Structured data formats

Use Context-Sensitive Grammars for:

  • Natural language processing
  • Complex syntax rules
  • Semantic constraints

Use Turing Machines for:

  • Theoretical analysis
  • Computability theory
  • Undecidability proofs

Glossary

  • Chomsky hierarchy: Classification of languages by computational power
  • Regular language: Recognizable by finite automata
  • Context-free language: Generated by context-free grammar
  • Context-sensitive language: Generated by context-sensitive grammar
  • Recursively enumerable: Recognizable by Turing machine
  • Finite automaton: Machine with finite states, no memory
  • Pushdown automaton: Machine with stack memory
  • Linear bounded automaton: Machine with bounded tape
  • Turing machine: Machine with infinite tape
  • Proper subset: Subset that is not equal to the original set

Practice Problems

Problem 1: Language Classification

Classify each language:

  1. {ab}
  2. {aⁿbⁿ | n ≥ 0}
  3. {aⁿbⁿcⁿ | n ≥ 0}

Solution:

  1. Regular
  2. Context-free (not regular)
  3. Context-sensitive (not context-free)

Problem 2: Grammar Design

Design a grammar for {aⁿbⁿcⁿ | n ≥ 0}.

Solution:

S → aSBC | ε
CB → BC
bB → bb
cC → cc

Problem 3: Automaton Selection

For each language, select the simplest automaton:

  1. {a*}
  2. {aⁿbⁿ | n ≥ 0}
  3. {aⁿbⁿcⁿ | n ≥ 0}

Solution:

  1. Finite automaton
  2. Pushdown automaton
  3. Linear bounded automaton

Problem 4: Decidability

For each language, determine if membership is decidable:

  1. {ab}
  2. {aⁿbⁿ | n ≥ 0}
  3. {M, w | M halts on w}

Solution:

  1. Decidable (O(n))
  2. Decidable (O(n³))
  3. Undecidable (semi-decidable)

Online Platforms

Interactive Tools

  • “Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
  • “Theory of Computation” by Sipser
  • “Elements of the Theory of Computation” by Lewis & Papadimitriou
  • “Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
  • “Computability and Complexity” by Papadimitriou

Academic Journals

Software Tools

Conclusion

The Chomsky hierarchy provides a fundamental framework for understanding:

  • The relationship between different language classes
  • The computational power of different automata
  • The decidability of language properties
  • The appropriate formalism for different problems

Understanding the Chomsky hierarchy is essential for formal language theory, compiler design, and computability theory.

In the next article, we’ll explore language recognition and acceptance, which builds on the concepts of automata and the Chomsky hierarchy.


Next Article: Language Recognition and Acceptance

Previous Article: Parsing and Syntax Analysis

Comments