Introduction
Context-free grammars (CFGs) are a fundamental tool for defining formal languages. They provide a way to specify the structure of strings in a language using production rules. CFGs are widely used in:
- Programming language design
- Parsing and compilation
- Natural language processing
- Formal language theory
- Syntax specification
In this article, we’ll explore context-free grammars, how they work, and how to use them to define languages.
Grammar Basics
Definition of Grammar
A grammar is a formal system for generating strings in a language. It consists of:
- Terminals: Symbols that appear in the strings (alphabet symbols)
- Non-terminals: Symbols that represent language constructs
- Production rules: Rules for replacing non-terminals with strings
- Start symbol: The non-terminal from which derivation begins
Notation:
G = (V, ฮฃ, R, S)
V = set of non-terminals
ฮฃ = set of terminals (alphabet)
R = set of production rules
S = start symbol
Production Rules
A production rule has the form:
A โ ฮฑ
Where:
- A is a non-terminal (left-hand side)
- ฮฑ is a string of terminals and non-terminals (right-hand side)
Examples:
S โ NP VP (sentence โ noun phrase verb phrase)
NP โ Det N (noun phrase โ determiner noun)
VP โ V NP (verb phrase โ verb noun phrase)
Det โ the | a (determiner โ "the" or "a")
N โ dog | cat (noun โ "dog" or "cat")
V โ chased | saw (verb โ "chased" or "saw")
Context-Free Grammars
Definition of CFG
A context-free grammar is a grammar where each production rule has exactly one non-terminal on the left-hand side.
Form:
A โ ฮฑ
where A is a non-terminal and ฮฑ is any string of terminals and non-terminals
Why “context-free”? The replacement of A doesn’t depend on the context (symbols around A). We can always replace A with ฮฑ regardless of what surrounds it.
Example CFG: Simple Arithmetic
E โ E + T | T
T โ T * F | F
F โ (E) | num
Terminals: +, *, (, ), num
Non-terminals: E, T, F
Start symbol: E
This grammar generates arithmetic expressions like:
- num
- num + num
- num * num
- (num + num) * num
Derivations
Definition of Derivation
A derivation is a sequence of production rule applications that transforms the start symbol into a string.
Notation:
S โ ฮฑโ โ ฮฑโ โ ... โ ฮฑโ
Leftmost Derivation
In a leftmost derivation, we always replace the leftmost non-terminal.
Example:
Grammar:
S โ NP VP
NP โ Det N
VP โ V NP
Det โ the
N โ dog | cat
V โ chased
Leftmost derivation of "the dog chased the cat":
S โ NP VP
โ Det N VP
โ the N VP
โ the dog VP
โ the dog V NP
โ the dog chased NP
โ the dog chased Det N
โ the dog chased the N
โ the dog chased the cat
Rightmost Derivation
In a rightmost derivation, we always replace the rightmost non-terminal.
Example:
Same grammar as above.
Rightmost derivation of "the dog chased the cat":
S โ NP VP
โ NP V NP
โ NP V Det N
โ NP V Det cat
โ NP V the cat
โ NP chased the cat
โ Det N chased the cat
โ Det dog chased the cat
โ the dog chased the cat
Parse Trees
Definition of Parse Tree
A parse tree is a tree representation of a derivation. Each:
- Internal node is a non-terminal
- Leaf node is a terminal
- Edge represents a production rule application
Example Parse Tree
Grammar:
S โ NP VP
NP โ Det N
VP โ V NP
Det โ the
N โ dog | cat
V โ chased
Parse tree for "the dog chased the cat":
S
/ \
NP VP
/ \ / \
Det N V NP
| | | / \
the dog chased Det N
| |
the cat
Ambiguity
A grammar is ambiguous if a string can have multiple parse trees.
Example:
Grammar:
E โ E + E | E * E | num
String: num + num * num
Parse tree 1 (left-associative):
E
/ \
E *
/ \ \
E + E
| | |
num num num
(Evaluates to: (num + num) * num)
Parse tree 2 (right-associative):
E
/ \
E +
| / \
num E *
| / \
num E E
| |
num num
(Evaluates to: num + (num * num))
Language Generated by a Grammar
Definition
The language generated by a grammar G, denoted L(G), is the set of all strings that can be derived from the start symbol.
Notation:
L(G) = {w | S โ* w and w contains only terminals}
Example
Grammar:
S โ aSb | ฮต
Language:
L(G) = {aโฟbโฟ | n โฅ 0} = {ฮต, ab, aabb, aaabbb, ...}
Derivations:
S โ ฮต (generates ฮต)
S โ aSb โ ab (generates ab)
S โ aSb โ aaSbb โ aabb (generates aabb)
Grammar Design
Designing a Grammar for a Language
Step 1: Understand the language structure Step 2: Identify the main components Step 3: Write production rules for each component Step 4: Test the grammar with examples
Example 1: Balanced Parentheses
Language: All strings of balanced parentheses
Grammar:
S โ (S) | SS | ฮต
Derivations:
S โ ฮต (empty)
S โ (S) โ () (one pair)
S โ (S) โ ((S)) โ (()) (nested)
S โ SS โ (S)(S) โ ()() (sequential)
Example 2: Arithmetic Expressions
Language: Arithmetic expressions with +, *, (, )
Grammar:
E โ E + T | T
T โ T * F | F
F โ (E) | num
Derivations:
E โ T โ F โ num (single number)
E โ E + T โ T + T โ F + F โ num + num
E โ E + T โ T + T โ T * F + T โ F * F + F โ num * num + num
Example 3: Simple Programming Language
Language: Simple variable assignments
Grammar:
S โ id = E;
E โ E + E | E * E | (E) | num | id
Derivations:
S โ id = E; โ id = num;
S โ id = E; โ id = E + E; โ id = num + num;
S โ id = E; โ id = E * E; โ id = (E) * E; โ id = (num) * num;
Normal Forms
Chomsky Normal Form (CNF)
A grammar is in Chomsky Normal Form if every production rule is of the form:
A โ BC (two non-terminals)
A โ a (single terminal)
S โ ฮต (only for start symbol)
Advantages:
- Simplifies parsing algorithms
- Guarantees finite derivation length
- Useful for theoretical analysis
Example:
Original grammar:
S โ aSb | ฮต
CNF:
S โ AX | ฮต
X โ SB
A โ a
B โ b
Greibach Normal Form (GNF)
A grammar is in Greibach Normal Form if every production rule is of the form:
A โ aฮฑ
where a is a terminal and ฮฑ is a string of non-terminals (possibly empty)
Advantages:
- Guarantees one terminal per derivation step
- Useful for parsing algorithms
Closure Properties
Closure Under Union
If Lโ and Lโ are context-free languages, then Lโ โช Lโ is context-free.
Proof:
Given grammars Gโ and Gโ with start symbols Sโ and Sโ.
Create new grammar G with start symbol S:
S โ Sโ | Sโ
Then L(G) = L(Gโ) โช L(Gโ)
Closure Under Concatenation
If Lโ and Lโ are context-free languages, then Lโ ยท Lโ is context-free.
Proof:
Given grammars Gโ and Gโ with start symbols Sโ and Sโ.
Create new grammar G with start symbol S:
S โ SโSโ
Then L(G) = L(Gโ) ยท L(Gโ)
Closure Under Kleene Star
If L is a context-free language, then L* is context-free.
Proof:
Given grammar G with start symbol S.
Create new grammar G' with start symbol S':
S' โ SS' | ฮต
Then L(G') = L(G)*
Non-Closure Under Intersection
Context-free languages are NOT closed under intersection.
Counterexample:
Lโ = {aโฟbโฟcแต | n,m โฅ 0} (context-free)
Lโ = {aโฟbแตcแต | n,m โฅ 0} (context-free)
Lโ โฉ Lโ = {aโฟbโฟcโฟ | n โฅ 0} (NOT context-free)
Glossary
- Grammar: Formal system for generating strings
- Terminal: Symbol that appears in strings
- Non-terminal: Symbol representing language constructs
- Production rule: Rule for replacing non-terminals
- Start symbol: Non-terminal from which derivation begins
- Derivation: Sequence of production rule applications
- Parse tree: Tree representation of a derivation
- Ambiguous grammar: Grammar with multiple parse trees for some strings
- Chomsky Normal Form: Restricted grammar form
- Greibach Normal Form: Another restricted grammar form
Practice Problems
Problem 1: Grammar Design
Design a grammar for the language {aโฟbโฟ | n โฅ 0}.
Solution:
S โ aSb | ฮต
Problem 2: Derivation
Given the grammar:
S โ aSb | ฮต
Show a derivation for “aabb”.
Solution:
S โ aSb โ aaSbb โ aabb
Problem 3: Parse Tree
Draw a parse tree for “aabb” using the grammar from Problem 1.
Solution:
S
/|\
a S b
/|\
a S b
|
ฮต
Problem 4: Language Recognition
Given the grammar:
S โ (S) | SS | ฮต
Is “(())” in the language? Show a derivation.
Solution:
Yes, (()) is in the language.
S โ (S) โ ((S)) โ (())
Related Resources
Online Platforms
- Stanford Encyclopedia of Philosophy - Formal language theory
- Khan Academy Computer Science - CS fundamentals
- Coursera Formal Languages - Online courses
- MIT OpenCourseWare - University courses
- Brilliant.org Computer Science - Interactive lessons
Interactive Tools
- JFLAP - Grammar and automata simulator
- Grammar Visualizer - Visualize grammars
- Parse Tree Generator - Generate parse trees
- Derivation Tracer - Trace derivations
- Grammar Checker - Check grammar rules
Recommended Books
- “Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
- “Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
- “Elements of the Theory of Computation” by Lewis & Papadimitriou
- “Compilers: Principles, Techniques, and Tools” by Aho, Lam, Sethi, Ullman
- “Theory of Computation” by Sipser
Academic Journals
- Journal of Computer and System Sciences
- Theoretical Computer Science
- ACM Transactions on Computational Logic
- Information and Computation
- Formal Aspects of Computing
Software Tools
- JFLAP - Automata and grammar simulator
- Graphviz - Graph visualization
- Lex/Yacc - Lexer/parser generators
- ANTLR - Parser generator
- Bison - Parser generator
Conclusion
Context-free grammars are a powerful tool for:
- Defining formal languages
- Specifying syntax of programming languages
- Parsing and compilation
- Natural language processing
- Formal language theory
Understanding CFGs is essential for studying parsing, compilation, and formal language theory.
In the next article, we’ll explore regular expressions and regular languages, which are simpler but still powerful.
Next Article: Regular Expressions and Regular Languages
Previous Article: Formal Languages: Alphabets, Strings, and Languages
Comments