Introduction
The Chomsky hierarchy is a fundamental classification system in formal language theory that organizes languages by their computational power. Developed by Noam Chomsky, it provides a framework for understanding the relationship between different types of grammars and automata.
In this article, we’ll explore the four levels of the Chomsky hierarchy and their properties.
Overview of the Chomsky Hierarchy
The Chomsky hierarchy consists of four levels, each more powerful than the previous:
Level 0: Recursively Enumerable Languages (Turing Machines)
↑
Level 1: Context-Sensitive Languages (Linear Bounded Automata)
↑
Level 2: Context-Free Languages (Pushdown Automata)
↑
Level 3: Regular Languages (Finite Automata)
Key Property: Each level is a proper subset of the level above it.
Level 3: Regular Languages
Definition
Regular languages are the simplest class in the hierarchy. They can be recognized by finite automata.
Grammar Form:
A → aB | a | ε
(Right-linear or left-linear)
Examples:
L₁ = {a*} (zero or more a's)
L₂ = {(ab)*} (zero or more ab's)
L₃ = {0, 1}* (all binary strings)
Recognizing Automaton
Finite Automaton (FA):
- Finite number of states
- Deterministic or non-deterministic
- No memory (except current state)
Example:
Language: {a*b*}
States: q0, q1, q2
Start: q0
Accept: q0, q1, q2
Transitions:
q0 --a--> q0
q0 --b--> q1
q1 --b--> q1
Properties
Closure Properties:
- Closed under union
- Closed under concatenation
- Closed under Kleene star
- Closed under complement
- Closed under intersection
Decidability:
- Membership: Decidable (O(n))
- Emptiness: Decidable
- Finiteness: Decidable
- Equivalence: Decidable
Level 2: Context-Free Languages
Definition
Context-free languages can be generated by context-free grammars.
Grammar Form:
A → α
where A is a non-terminal and α is any string of terminals and non-terminals
Examples:
L₁ = {aⁿbⁿ | n ≥ 0} (equal a's and b's)
L₂ = {ww^R | w ∈ {a,b}*} (palindromes)
L₃ = {(ⁿ)ⁿ | n ≥ 0} (balanced parentheses)
Recognizing Automaton
Pushdown Automaton (PDA):
- Finite number of states
- Stack for memory
- Can read and write to stack
Example:
Language: {aⁿbⁿ | n ≥ 0}
States: q0, q1, q2
Start: q0
Accept: q2
Transitions:
q0 --a, ε/A--> q0 (push A)
q0 --b, A/ε --> q1 (pop A)
q1 --b, A/ε --> q1 (pop A)
q1 --ε, ε/ε --> q2 (accept)
Properties
Closure Properties:
- Closed under union
- Closed under concatenation
- Closed under Kleene star
- NOT closed under intersection
- NOT closed under complement
Decidability:
- Membership: Decidable (O(n³))
- Emptiness: Decidable
- Finiteness: Decidable
- Equivalence: Undecidable
Relationship to Regular Languages
Every regular language is context-free.
Proof:
Given regular grammar:
A → aB | a | ε
This is also a context-free grammar.
Therefore, every regular language is context-free.
Level 1: Context-Sensitive Languages
Definition
Context-sensitive languages can be generated by context-sensitive grammars.
Grammar Form:
αAβ → αγβ
where |γ| ≥ 1 (non-contracting)
Examples:
L₁ = {aⁿbⁿcⁿ | n ≥ 0} (equal a's, b's, and c's)
L₂ = {ww | w ∈ {a,b}*} (string concatenated with itself)
Recognizing Automaton
Linear Bounded Automaton (LBA):
- Finite number of states
- Tape bounded by input length
- Can read and write to tape
Properties:
- More powerful than PDA
- Less powerful than Turing machine
- Tape size is O(n) where n is input length
Properties
Closure Properties:
- Closed under union
- Closed under concatenation
- Closed under Kleene star
- Closed under complement
- Closed under intersection
Decidability:
- Membership: Decidable (O(n^k) for some k)
- Emptiness: Undecidable
- Finiteness: Undecidable
- Equivalence: Undecidable
Relationship to Context-Free Languages
Not every context-sensitive language is context-free.
Counterexample:
L = {aⁿbⁿcⁿ | n ≥ 0}
This is context-sensitive but NOT context-free.
Proof: Context-free languages cannot count three independent sequences.
Level 0: Recursively Enumerable Languages
Definition
Recursively enumerable languages can be recognized by Turing machines.
Grammar Form:
Any production rule (including contracting rules)
Examples:
L₁ = All computable languages
L₂ = Halting problem (semi-decidable)
Recognizing Automaton
Turing Machine (TM):
- Finite number of states
- Infinite tape
- Can read and write to tape
- Can move left and right
Properties:
- Most powerful automaton
- Can compute any computable function
- May not halt on invalid input
Properties
Closure Properties:
- Closed under union
- Closed under concatenation
- Closed under Kleene star
- NOT closed under complement
Decidability:
- Membership: Semi-decidable (may not halt)
- Emptiness: Undecidable
- Finiteness: Undecidable
- Equivalence: Undecidable
Relationship to Context-Sensitive Languages
Not every recursively enumerable language is context-sensitive.
Counterexample:
Halting problem: {M, w | Turing machine M halts on input w}
This is recursively enumerable but NOT context-sensitive.
Hierarchy Summary
Language Classes
| Level | Grammar | Automaton | Closure | Decidable |
|---|---|---|---|---|
| 3 | Regular | FA | All | Yes |
| 2 | CFG | PDA | Union, Concat, * | Membership |
| 1 | CSG | LBA | All | Membership |
| 0 | Unrestricted | TM | Union, Concat, * | Semi-decidable |
Proper Subset Relationships
Regular ⊂ Context-Free ⊂ Context-Sensitive ⊂ Recursively Enumerable
Each level is a proper subset of the next (there exist languages in each level not in the previous).
Examples of Languages at Each Level
Regular Languages Only
L = {a*b*}
L = {(ab)*}
L = {0, 1}*
Context-Free but Not Regular
L = {aⁿbⁿ | n ≥ 0}
L = {ww^R | w ∈ {a,b}*}
L = {(ⁿ)ⁿ | n ≥ 0}
Context-Sensitive but Not Context-Free
L = {aⁿbⁿcⁿ | n ≥ 0}
L = {ww | w ∈ {a,b}*}
Recursively Enumerable but Not Context-Sensitive
L = {M, w | Turing machine M halts on input w}
L = {G, w | Grammar G generates string w}
Practical Implications
Choosing the Right Formalism
Use Regular Expressions/FA for:
- Simple pattern matching
- Lexical analysis
- Input validation
Use Context-Free Grammars for:
- Programming language syntax
- Parsing
- Structured data formats
Use Context-Sensitive Grammars for:
- Natural language processing
- Complex syntax rules
- Semantic constraints
Use Turing Machines for:
- Theoretical analysis
- Computability theory
- Undecidability proofs
Glossary
- Chomsky hierarchy: Classification of languages by computational power
- Regular language: Recognizable by finite automata
- Context-free language: Generated by context-free grammar
- Context-sensitive language: Generated by context-sensitive grammar
- Recursively enumerable: Recognizable by Turing machine
- Finite automaton: Machine with finite states, no memory
- Pushdown automaton: Machine with stack memory
- Linear bounded automaton: Machine with bounded tape
- Turing machine: Machine with infinite tape
- Proper subset: Subset that is not equal to the original set
Practice Problems
Problem 1: Language Classification
Classify each language:
- {ab}
- {aⁿbⁿ | n ≥ 0}
- {aⁿbⁿcⁿ | n ≥ 0}
Solution:
- Regular
- Context-free (not regular)
- Context-sensitive (not context-free)
Problem 2: Grammar Design
Design a grammar for {aⁿbⁿcⁿ | n ≥ 0}.
Solution:
S → aSBC | ε
CB → BC
bB → bb
cC → cc
Problem 3: Automaton Selection
For each language, select the simplest automaton:
- {a*}
- {aⁿbⁿ | n ≥ 0}
- {aⁿbⁿcⁿ | n ≥ 0}
Solution:
- Finite automaton
- Pushdown automaton
- Linear bounded automaton
Problem 4: Decidability
For each language, determine if membership is decidable:
- {ab}
- {aⁿbⁿ | n ≥ 0}
- {M, w | M halts on w}
Solution:
- Decidable (O(n))
- Decidable (O(n³))
- Undecidable (semi-decidable)
Related Resources
Online Platforms
- Stanford Encyclopedia of Philosophy - Formal language theory
- Khan Academy Computer Science - CS fundamentals
- Coursera Formal Languages - Online courses
- MIT OpenCourseWare - University courses
- Brilliant.org Computer Science - Interactive lessons
Interactive Tools
- JFLAP - Automata simulator
- Automata Visualizer - Visualize automata
- Grammar Visualizer - Visualize grammars
- Turing Machine Simulator - Simulate Turing machines
- Language Classifier - Classify languages
Recommended Books
- “Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
- “Theory of Computation” by Sipser
- “Elements of the Theory of Computation” by Lewis & Papadimitriou
- “Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
- “Computability and Complexity” by Papadimitriou
Academic Journals
- Journal of Computer and System Sciences
- Theoretical Computer Science
- ACM Transactions on Computational Logic
- Information and Computation
- Formal Aspects of Computing
Software Tools
- JFLAP - Automata simulator
- Graphviz - Graph visualization
- Turing Machine Simulator - Simulate Turing machines
- Automata Toolkit - Automata tools
- Language Tools - Language tools
Conclusion
The Chomsky hierarchy provides a fundamental framework for understanding:
- The relationship between different language classes
- The computational power of different automata
- The decidability of language properties
- The appropriate formalism for different problems
Understanding the Chomsky hierarchy is essential for formal language theory, compiler design, and computability theory.
In the next article, we’ll explore language recognition and acceptance, which builds on the concepts of automata and the Chomsky hierarchy.
Next Article: Language Recognition and Acceptance
Previous Article: Parsing and Syntax Analysis
Comments