Context-Free Grammars (CFG)

Introduction

Context-free grammars (CFGs) are a fundamental tool for defining formal languages. They provide a way to specify the structure of strings in a language using production rules. CFGs are widely used in:

Programming language design
Parsing and compilation
Natural language processing
Formal language theory
Syntax specification

In this article, we’ll explore context-free grammars, how they work, and how to use them to define languages.

Grammar Basics

Definition of Grammar

A grammar is a formal system for generating strings in a language. It consists of:

Terminals: Symbols that appear in the strings (alphabet symbols)
Non-terminals: Symbols that represent language constructs
Production rules: Rules for replacing non-terminals with strings
Start symbol: The non-terminal from which derivation begins

Notation:

G = (V, Σ, R, S)
V = set of non-terminals
Σ = set of terminals (alphabet)
R = set of production rules
S = start symbol

Production Rules

A production rule has the form:

A → α

Where:

A is a non-terminal (left-hand side)
α is a string of terminals and non-terminals (right-hand side)

Examples:

S → NP VP           (sentence → noun phrase verb phrase)
NP → Det N          (noun phrase → determiner noun)
VP → V NP           (verb phrase → verb noun phrase)
Det → the | a        (determiner → "the" or "a")
N → dog | cat        (noun → "dog" or "cat")
V → chased | saw     (verb → "chased" or "saw")

Context-Free Grammars

Definition of CFG

A context-free grammar is a grammar where each production rule has exactly one non-terminal on the left-hand side.

Form:

A → α
where A is a non-terminal and α is any string of terminals and non-terminals

Why “context-free”? The replacement of A doesn’t depend on the context (symbols around A). We can always replace A with α regardless of what surrounds it.

Example CFG: Simple Arithmetic

E → E + T | T
T → T * F | F
F → (E) | num

Terminals: +, *, (, ), num
Non-terminals: E, T, F
Start symbol: E

This grammar generates arithmetic expressions like:

num
num + num
num * num
(num + num) * num

Derivations

Definition of Derivation

A derivation is a sequence of production rule applications that transforms the start symbol into a string.

Notation:

S ⇒ α₁ ⇒ α₂ ⇒ ... ⇒ αₙ

Leftmost Derivation

In a leftmost derivation, we always replace the leftmost non-terminal.

Example:

Grammar:
S → NP VP
NP → Det N
VP → V NP
Det → the
N → dog | cat
V → chased

Leftmost derivation of "the dog chased the cat":
S ⇒ NP VP
  ⇒ Det N VP
  ⇒ the N VP
  ⇒ the dog VP
  ⇒ the dog V NP
  ⇒ the dog chased NP
  ⇒ the dog chased Det N
  ⇒ the dog chased the N
  ⇒ the dog chased the cat

Rightmost Derivation

In a rightmost derivation, we always replace the rightmost non-terminal.

Example:

Same grammar as above.

Rightmost derivation of "the dog chased the cat":
S ⇒ NP VP
  ⇒ NP V NP
  ⇒ NP V Det N
  ⇒ NP V Det cat
  ⇒ NP V the cat
  ⇒ NP chased the cat
  ⇒ Det N chased the cat
  ⇒ Det dog chased the cat
  ⇒ the dog chased the cat

Parse Trees

Definition of Parse Tree

A parse tree is a tree representation of a derivation. Each:

Internal node is a non-terminal
Leaf node is a terminal
Edge represents a production rule application

Example Parse Tree

Grammar:
S → NP VP
NP → Det N
VP → V NP
Det → the
N → dog | cat
V → chased

Parse tree for "the dog chased the cat":

        S
       / \
      NP  VP
     / \  / \
   Det N V  NP
    |  |  |  / \
   the dog chased Det N
              |   |
             the  cat

Ambiguity

A grammar is ambiguous if a string can have multiple parse trees.

Example:

Grammar:
E → E + E | E * E | num

String: num + num * num

Parse tree 1 (left-associative):
      E
     / \
    E   *
   / \   \
  E   +   E
  |   |   |
 num num num
(Evaluates to: (num + num) * num)

Parse tree 2 (right-associative):
      E
     / \
    E   +
    |   / \
   num E   *
       |   / \
      num E   E
          |   |
         num num
(Evaluates to: num + (num * num))

Language Generated by a Grammar

Definition

The language generated by a grammar G, denoted L(G), is the set of all strings that can be derived from the start symbol.

Notation:

L(G) = {w | S ⇒* w and w contains only terminals}

Example

Grammar:

S → aSb | ε

Language:

L(G) = {aⁿbⁿ | n ≥ 0} = {ε, ab, aabb, aaabbb, ...}

Derivations:

S ⇒ ε                    (generates ε)
S ⇒ aSb ⇒ ab             (generates ab)
S ⇒ aSb ⇒ aaSbb ⇒ aabb  (generates aabb)

Grammar Design

Designing a Grammar for a Language

Step 1: Understand the language structure Step 2: Identify the main components Step 3: Write production rules for each component Step 4: Test the grammar with examples

Example 1: Balanced Parentheses

Language: All strings of balanced parentheses

Grammar:

S → (S) | SS | ε

Derivations:

S ⇒ ε                           (empty)
S ⇒ (S) ⇒ ()                    (one pair)
S ⇒ (S) ⇒ ((S)) ⇒ (())          (nested)
S ⇒ SS ⇒ (S)(S) ⇒ ()()          (sequential)

Example 2: Arithmetic Expressions

Language: Arithmetic expressions with +, *, (, )

Grammar:

E → E + T | T
T → T * F | F
F → (E) | num

Derivations:

E ⇒ T ⇒ F ⇒ num                 (single number)
E ⇒ E + T ⇒ T + T ⇒ F + F ⇒ num + num
E ⇒ E + T ⇒ T + T ⇒ T * F + T ⇒ F * F + F ⇒ num * num + num

Example 3: Simple Programming Language

Language: Simple variable assignments

Grammar:

S → id = E;
E → E + E | E * E | (E) | num | id

Derivations:

S ⇒ id = E; ⇒ id = num;
S ⇒ id = E; ⇒ id = E + E; ⇒ id = num + num;
S ⇒ id = E; ⇒ id = E * E; ⇒ id = (E) * E; ⇒ id = (num) * num;

Normal Forms

Chomsky Normal Form (CNF)

A grammar is in Chomsky Normal Form if every production rule is of the form:

A → BC    (two non-terminals)
A → a     (single terminal)
S → ε     (only for start symbol)

Advantages:

Simplifies parsing algorithms
Guarantees finite derivation length
Useful for theoretical analysis

Example:

Original grammar:
S → aSb | ε

CNF:
S → AX | ε
X → SB
A → a
B → b

Greibach Normal Form (GNF)

A grammar is in Greibach Normal Form if every production rule is of the form:

A → aα
where a is a terminal and α is a string of non-terminals (possibly empty)

Advantages:

Guarantees one terminal per derivation step
Useful for parsing algorithms

Closure Properties

Closure Under Union

If L₁ and L₂ are context-free languages, then L₁ ∪ L₂ is context-free.

Proof:

Given grammars G₁ and G₂ with start symbols S₁ and S₂.
Create new grammar G with start symbol S:
S → S₁ | S₂
Then L(G) = L(G₁) ∪ L(G₂)

Closure Under Concatenation

If L₁ and L₂ are context-free languages, then L₁ · L₂ is context-free.

Proof:

Given grammars G₁ and G₂ with start symbols S₁ and S₂.
Create new grammar G with start symbol S:
S → S₁S₂
Then L(G) = L(G₁) · L(G₂)

Closure Under Kleene Star

If L is a context-free language, then L* is context-free.

Proof:

Given grammar G with start symbol S.
Create new grammar G' with start symbol S':
S' → SS' | ε
Then L(G') = L(G)*

Non-Closure Under Intersection

Context-free languages are NOT closed under intersection.

Counterexample:

L₁ = {aⁿbⁿcᵐ | n,m ≥ 0}  (context-free)
L₂ = {aⁿbᵐcᵐ | n,m ≥ 0}  (context-free)
L₁ ∩ L₂ = {aⁿbⁿcⁿ | n ≥ 0}  (NOT context-free)

Glossary

Grammar: Formal system for generating strings
Terminal: Symbol that appears in strings
Non-terminal: Symbol representing language constructs
Production rule: Rule for replacing non-terminals
Start symbol: Non-terminal from which derivation begins
Derivation: Sequence of production rule applications
Parse tree: Tree representation of a derivation
Ambiguous grammar: Grammar with multiple parse trees for some strings
Chomsky Normal Form: Restricted grammar form
Greibach Normal Form: Another restricted grammar form

Practice Problems

Problem 1: Grammar Design

Design a grammar for the language {aⁿbⁿ | n ≥ 0}.

Solution:

S → aSb | ε

Problem 2: Derivation

Given the grammar:

S → aSb | ε

Show a derivation for “aabb”.

Solution:

S ⇒ aSb ⇒ aaSbb ⇒ aabb

Problem 3: Parse Tree

Draw a parse tree for “aabb” using the grammar from Problem 1.

Solution:

      S
     /|\
    a S b
     /|\
    a S b
      |
      ε

Problem 4: Language Recognition

Given the grammar:

S → (S) | SS | ε

Is “(())” in the language? Show a derivation.

Solution:

Yes, (()) is in the language.
S ⇒ (S) ⇒ ((S)) ⇒ (())

Online Platforms

Stanford Encyclopedia of Philosophy - Formal language theory
Khan Academy Computer Science - CS fundamentals
Coursera Formal Languages - Online courses
MIT OpenCourseWare - University courses
Brilliant.org Computer Science - Interactive lessons

Interactive Tools

JFLAP - Grammar and automata simulator
Grammar Visualizer - Visualize grammars
Parse Tree Generator - Generate parse trees
Derivation Tracer - Trace derivations
Grammar Checker - Check grammar rules

Recommended Books

“Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
“Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
“Elements of the Theory of Computation” by Lewis & Papadimitriou
“Compilers: Principles, Techniques, and Tools” by Aho, Lam, Sethi, Ullman
“Theory of Computation” by Sipser

Academic Journals

Software Tools

JFLAP - Automata and grammar simulator
Graphviz - Graph visualization
Lex/Yacc - Lexer/parser generators
ANTLR - Parser generator
Bison - Parser generator

Conclusion

Context-free grammars are a powerful tool for:

Defining formal languages
Specifying syntax of programming languages
Parsing and compilation
Natural language processing
Formal language theory

Understanding CFGs is essential for studying parsing, compilation, and formal language theory.

In the next article, we’ll explore regular expressions and regular languages, which are simpler but still powerful.

Next Article: Regular Expressions and Regular Languages

Previous Article: Formal Languages: Alphabets, Strings, and Languages

Introduction

Grammar Basics

Definition of Grammar

Production Rules

Context-Free Grammars

Definition of CFG

Example CFG: Simple Arithmetic

Derivations

Definition of Derivation

Leftmost Derivation

Rightmost Derivation

Parse Trees

Definition of Parse Tree

Example Parse Tree

Ambiguity

Language Generated by a Grammar

Definition

Example

Grammar Design

Designing a Grammar for a Language

Example 1: Balanced Parentheses

Example 2: Arithmetic Expressions

Example 3: Simple Programming Language

Normal Forms

Chomsky Normal Form (CNF)

Greibach Normal Form (GNF)

Closure Properties

Closure Under Union

Closure Under Concatenation

Closure Under Kleene Star

Non-Closure Under Intersection

Glossary

Practice Problems

Problem 1: Grammar Design

Problem 2: Derivation

Problem 3: Parse Tree

Problem 4: Language Recognition

Related Resources

Online Platforms

Interactive Tools

Recommended Books

Academic Journals

Software Tools

Conclusion

Comments

Share this article

👍 Was this article helpful?