Skip to main content
โšก Calmops

Context-Free Grammars (CFG)

Introduction

Context-free grammars (CFGs) are a fundamental tool for defining formal languages. They provide a way to specify the structure of strings in a language using production rules. CFGs are widely used in:

  • Programming language design
  • Parsing and compilation
  • Natural language processing
  • Formal language theory
  • Syntax specification

In this article, we’ll explore context-free grammars, how they work, and how to use them to define languages.

Grammar Basics

Definition of Grammar

A grammar is a formal system for generating strings in a language. It consists of:

  1. Terminals: Symbols that appear in the strings (alphabet symbols)
  2. Non-terminals: Symbols that represent language constructs
  3. Production rules: Rules for replacing non-terminals with strings
  4. Start symbol: The non-terminal from which derivation begins

Notation:

G = (V, ฮฃ, R, S)
V = set of non-terminals
ฮฃ = set of terminals (alphabet)
R = set of production rules
S = start symbol

Production Rules

A production rule has the form:

A โ†’ ฮฑ

Where:

  • A is a non-terminal (left-hand side)
  • ฮฑ is a string of terminals and non-terminals (right-hand side)

Examples:

S โ†’ NP VP           (sentence โ†’ noun phrase verb phrase)
NP โ†’ Det N          (noun phrase โ†’ determiner noun)
VP โ†’ V NP           (verb phrase โ†’ verb noun phrase)
Det โ†’ the | a        (determiner โ†’ "the" or "a")
N โ†’ dog | cat        (noun โ†’ "dog" or "cat")
V โ†’ chased | saw     (verb โ†’ "chased" or "saw")

Context-Free Grammars

Definition of CFG

A context-free grammar is a grammar where each production rule has exactly one non-terminal on the left-hand side.

Form:

A โ†’ ฮฑ
where A is a non-terminal and ฮฑ is any string of terminals and non-terminals

Why “context-free”? The replacement of A doesn’t depend on the context (symbols around A). We can always replace A with ฮฑ regardless of what surrounds it.

Example CFG: Simple Arithmetic

E โ†’ E + T | T
T โ†’ T * F | F
F โ†’ (E) | num

Terminals: +, *, (, ), num
Non-terminals: E, T, F
Start symbol: E

This grammar generates arithmetic expressions like:

  • num
  • num + num
  • num * num
  • (num + num) * num

Derivations

Definition of Derivation

A derivation is a sequence of production rule applications that transforms the start symbol into a string.

Notation:

S โ‡’ ฮฑโ‚ โ‡’ ฮฑโ‚‚ โ‡’ ... โ‡’ ฮฑโ‚™

Leftmost Derivation

In a leftmost derivation, we always replace the leftmost non-terminal.

Example:

Grammar:
S โ†’ NP VP
NP โ†’ Det N
VP โ†’ V NP
Det โ†’ the
N โ†’ dog | cat
V โ†’ chased

Leftmost derivation of "the dog chased the cat":
S โ‡’ NP VP
  โ‡’ Det N VP
  โ‡’ the N VP
  โ‡’ the dog VP
  โ‡’ the dog V NP
  โ‡’ the dog chased NP
  โ‡’ the dog chased Det N
  โ‡’ the dog chased the N
  โ‡’ the dog chased the cat

Rightmost Derivation

In a rightmost derivation, we always replace the rightmost non-terminal.

Example:

Same grammar as above.

Rightmost derivation of "the dog chased the cat":
S โ‡’ NP VP
  โ‡’ NP V NP
  โ‡’ NP V Det N
  โ‡’ NP V Det cat
  โ‡’ NP V the cat
  โ‡’ NP chased the cat
  โ‡’ Det N chased the cat
  โ‡’ Det dog chased the cat
  โ‡’ the dog chased the cat

Parse Trees

Definition of Parse Tree

A parse tree is a tree representation of a derivation. Each:

  • Internal node is a non-terminal
  • Leaf node is a terminal
  • Edge represents a production rule application

Example Parse Tree

Grammar:
S โ†’ NP VP
NP โ†’ Det N
VP โ†’ V NP
Det โ†’ the
N โ†’ dog | cat
V โ†’ chased

Parse tree for "the dog chased the cat":

        S
       / \
      NP  VP
     / \  / \
   Det N V  NP
    |  |  |  / \
   the dog chased Det N
              |   |
             the  cat

Ambiguity

A grammar is ambiguous if a string can have multiple parse trees.

Example:

Grammar:
E โ†’ E + E | E * E | num

String: num + num * num

Parse tree 1 (left-associative):
      E
     / \
    E   *
   / \   \
  E   +   E
  |   |   |
 num num num
(Evaluates to: (num + num) * num)

Parse tree 2 (right-associative):
      E
     / \
    E   +
    |   / \
   num E   *
       |   / \
      num E   E
          |   |
         num num
(Evaluates to: num + (num * num))

Language Generated by a Grammar

Definition

The language generated by a grammar G, denoted L(G), is the set of all strings that can be derived from the start symbol.

Notation:

L(G) = {w | S โ‡’* w and w contains only terminals}

Example

Grammar:

S โ†’ aSb | ฮต

Language:

L(G) = {aโฟbโฟ | n โ‰ฅ 0} = {ฮต, ab, aabb, aaabbb, ...}

Derivations:

S โ‡’ ฮต                    (generates ฮต)
S โ‡’ aSb โ‡’ ab             (generates ab)
S โ‡’ aSb โ‡’ aaSbb โ‡’ aabb  (generates aabb)

Grammar Design

Designing a Grammar for a Language

Step 1: Understand the language structure Step 2: Identify the main components Step 3: Write production rules for each component Step 4: Test the grammar with examples

Example 1: Balanced Parentheses

Language: All strings of balanced parentheses

Grammar:

S โ†’ (S) | SS | ฮต

Derivations:

S โ‡’ ฮต                           (empty)
S โ‡’ (S) โ‡’ ()                    (one pair)
S โ‡’ (S) โ‡’ ((S)) โ‡’ (())          (nested)
S โ‡’ SS โ‡’ (S)(S) โ‡’ ()()          (sequential)

Example 2: Arithmetic Expressions

Language: Arithmetic expressions with +, *, (, )

Grammar:

E โ†’ E + T | T
T โ†’ T * F | F
F โ†’ (E) | num

Derivations:

E โ‡’ T โ‡’ F โ‡’ num                 (single number)
E โ‡’ E + T โ‡’ T + T โ‡’ F + F โ‡’ num + num
E โ‡’ E + T โ‡’ T + T โ‡’ T * F + T โ‡’ F * F + F โ‡’ num * num + num

Example 3: Simple Programming Language

Language: Simple variable assignments

Grammar:

S โ†’ id = E;
E โ†’ E + E | E * E | (E) | num | id

Derivations:

S โ‡’ id = E; โ‡’ id = num;
S โ‡’ id = E; โ‡’ id = E + E; โ‡’ id = num + num;
S โ‡’ id = E; โ‡’ id = E * E; โ‡’ id = (E) * E; โ‡’ id = (num) * num;

Normal Forms

Chomsky Normal Form (CNF)

A grammar is in Chomsky Normal Form if every production rule is of the form:

A โ†’ BC    (two non-terminals)
A โ†’ a     (single terminal)
S โ†’ ฮต     (only for start symbol)

Advantages:

  • Simplifies parsing algorithms
  • Guarantees finite derivation length
  • Useful for theoretical analysis

Example:

Original grammar:
S โ†’ aSb | ฮต

CNF:
S โ†’ AX | ฮต
X โ†’ SB
A โ†’ a
B โ†’ b

Greibach Normal Form (GNF)

A grammar is in Greibach Normal Form if every production rule is of the form:

A โ†’ aฮฑ
where a is a terminal and ฮฑ is a string of non-terminals (possibly empty)

Advantages:

  • Guarantees one terminal per derivation step
  • Useful for parsing algorithms

Closure Properties

Closure Under Union

If Lโ‚ and Lโ‚‚ are context-free languages, then Lโ‚ โˆช Lโ‚‚ is context-free.

Proof:

Given grammars Gโ‚ and Gโ‚‚ with start symbols Sโ‚ and Sโ‚‚.
Create new grammar G with start symbol S:
S โ†’ Sโ‚ | Sโ‚‚
Then L(G) = L(Gโ‚) โˆช L(Gโ‚‚)

Closure Under Concatenation

If Lโ‚ and Lโ‚‚ are context-free languages, then Lโ‚ ยท Lโ‚‚ is context-free.

Proof:

Given grammars Gโ‚ and Gโ‚‚ with start symbols Sโ‚ and Sโ‚‚.
Create new grammar G with start symbol S:
S โ†’ Sโ‚Sโ‚‚
Then L(G) = L(Gโ‚) ยท L(Gโ‚‚)

Closure Under Kleene Star

If L is a context-free language, then L* is context-free.

Proof:

Given grammar G with start symbol S.
Create new grammar G' with start symbol S':
S' โ†’ SS' | ฮต
Then L(G') = L(G)*

Non-Closure Under Intersection

Context-free languages are NOT closed under intersection.

Counterexample:

Lโ‚ = {aโฟbโฟcแต | n,m โ‰ฅ 0}  (context-free)
Lโ‚‚ = {aโฟbแตcแต | n,m โ‰ฅ 0}  (context-free)
Lโ‚ โˆฉ Lโ‚‚ = {aโฟbโฟcโฟ | n โ‰ฅ 0}  (NOT context-free)

Glossary

  • Grammar: Formal system for generating strings
  • Terminal: Symbol that appears in strings
  • Non-terminal: Symbol representing language constructs
  • Production rule: Rule for replacing non-terminals
  • Start symbol: Non-terminal from which derivation begins
  • Derivation: Sequence of production rule applications
  • Parse tree: Tree representation of a derivation
  • Ambiguous grammar: Grammar with multiple parse trees for some strings
  • Chomsky Normal Form: Restricted grammar form
  • Greibach Normal Form: Another restricted grammar form

Practice Problems

Problem 1: Grammar Design

Design a grammar for the language {aโฟbโฟ | n โ‰ฅ 0}.

Solution:

S โ†’ aSb | ฮต

Problem 2: Derivation

Given the grammar:

S โ†’ aSb | ฮต

Show a derivation for “aabb”.

Solution:

S โ‡’ aSb โ‡’ aaSbb โ‡’ aabb

Problem 3: Parse Tree

Draw a parse tree for “aabb” using the grammar from Problem 1.

Solution:

      S
     /|\
    a S b
     /|\
    a S b
      |
      ฮต

Problem 4: Language Recognition

Given the grammar:

S โ†’ (S) | SS | ฮต

Is “(())” in the language? Show a derivation.

Solution:

Yes, (()) is in the language.
S โ‡’ (S) โ‡’ ((S)) โ‡’ (())

Online Platforms

Interactive Tools

  • “Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
  • “Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
  • “Elements of the Theory of Computation” by Lewis & Papadimitriou
  • “Compilers: Principles, Techniques, and Tools” by Aho, Lam, Sethi, Ullman
  • “Theory of Computation” by Sipser

Academic Journals

Software Tools

  • JFLAP - Automata and grammar simulator
  • Graphviz - Graph visualization
  • Lex/Yacc - Lexer/parser generators
  • ANTLR - Parser generator
  • Bison - Parser generator

Conclusion

Context-free grammars are a powerful tool for:

  • Defining formal languages
  • Specifying syntax of programming languages
  • Parsing and compilation
  • Natural language processing
  • Formal language theory

Understanding CFGs is essential for studying parsing, compilation, and formal language theory.

In the next article, we’ll explore regular expressions and regular languages, which are simpler but still powerful.


Next Article: Regular Expressions and Regular Languages

Previous Article: Formal Languages: Alphabets, Strings, and Languages

Comments