Skip to main content
⚡ Calmops

Formal Languages: Alphabets, Strings, and Languages

Introduction

Formal languages are the mathematical foundation for understanding computation, programming languages, and automated reasoning. They provide a precise way to define what strings of symbols are valid according to specific rules.

In this article, we’ll explore the fundamental concepts of formal languages: alphabets, strings, and how languages are formally defined and manipulated.

Alphabets and Symbols

Definition of Alphabet

An alphabet is a finite, non-empty set of symbols. We typically denote an alphabet as Σ (sigma).

Examples:

Σ₁ = {0, 1}                    (binary alphabet)
Σ₂ = {a, b, c, ..., z}         (lowercase letters)
Σ₃ = {0, 1, 2, ..., 9}         (digits)
Σ₄ = {(, ), +, -, *, /}        (arithmetic operators)
Σ₅ = {if, then, else, while}   (keywords)

Properties of Alphabets

Finite: An alphabet contains a finite number of symbols.

Valid: {a, b, c}
Invalid: {a, b, c, ...} (infinite)

Non-empty: An alphabet must contain at least one symbol.

Valid: {a}
Invalid: {} (empty)

Distinct symbols: Each symbol is unique within the alphabet.

Valid: {a, b, c}
Invalid: {a, b, a} (a appears twice)

Alphabet Notation

We use different notations for different alphabets:

  • Σ (sigma): Generic alphabet
  • Σ₁, Σ₂, …: Multiple alphabets
  • {a, b, c}: Explicit set notation
  • [0-9]: Range notation (digits 0 through 9)

Strings and Words

Definition of String

A string (or word) is a finite sequence of symbols from an alphabet.

Examples:

Alphabet: Σ = {a, b}
Strings: ε, a, b, aa, ab, ba, bb, aaa, aab, ...

Empty String

The empty string (denoted ε or λ) is the string with no symbols.

Properties:

Length of ε: |ε| = 0
ε concatenated with any string s: ε · s = s · ε = s

String Length

The length of a string is the number of symbols it contains.

Examples:

|a| = 1
|ab| = 2
|aaa| = 3
|ε| = 0

String Operations

Concatenation: Joining two strings together.

s₁ = "ab"
s₂ = "cd"
s₁ · s₂ = "abcd"

Repetition: Repeating a string n times.

s = "ab"
s² = "abab"
s³ = "ababab"
s⁰ = ε

Reversal: Reversing the order of symbols.

s = "abc"
sᴿ = "cba"

Substring: A contiguous sequence of symbols within a string.

s = "abcde"
Substrings: "a", "ab", "abc", "b", "bc", "bcd", "c", "cd", "cde", "d", "de", "e"

Prefix: A substring that starts at the beginning.

s = "abcde"
Prefixes: ε, "a", "ab", "abc", "abcd", "abcde"

Suffix: A substring that ends at the end.

s = "abcde"
Suffixes: ε, "e", "de", "cde", "bcde", "abcde"

Formal Languages

Definition of Language

A formal language is a set of strings over an alphabet. We denote a language as L.

Examples:

L₁ = {ε, a, aa, aaa, ...}           (all strings of a's)
L₂ = {ab, aabb, aaabbb, ...}        (equal a's and b's)
L₃ = {0, 1, 10, 11, 100, 101, ...}  (binary numbers)
L₄ = {}                              (empty language)

Language Notation

Set notation:

L = {s | s is a string over Σ and s satisfies property P}

Examples:

L₁ = {aⁿ | n ≥ 0}                    (all strings of a's)
L₂ = {aⁿbⁿ | n ≥ 0}                  (equal a's and b's)
L₃ = {w | w ∈ {0,1}* and w is a binary number}

Language Operations

Union: Combining two languages.

L₁ = {a, aa}
L₂ = {b, bb}
L₁ ∪ L₂ = {a, aa, b, bb}

Intersection: Strings in both languages.

L₁ = {a, aa, aaa}
L₂ = {aa, aaa, aaaa}
L₁ ∩ L₂ = {aa, aaa}

Complement: Strings not in the language.

Σ = {a, b}
L = {a, aa}
L̄ = {ε, b, ab, ba, bb, aab, aba, ...}

Concatenation: Joining strings from two languages.

L₁ = {a, aa}
L₂ = {b, bb}
L₁ · L₂ = {ab, abb, aab, aabb}

Kleene Star: Zero or more repetitions.

L = {a, b}
L* = {ε, a, b, aa, ab, ba, bb, aaa, ...}

Kleene Plus: One or more repetitions.

L = {a, b}
L⁺ = {a, b, aa, ab, ba, bb, aaa, ...}

Formal Language Classes

Finite Languages

A language with a finite number of strings.

Example:

L = {a, ab, abc}
|L| = 3

Infinite Languages

A language with an infinite number of strings.

Example:

L = {aⁿ | n ≥ 0} = {ε, a, aa, aaa, ...}
|L| = ∞

Regular Languages

Languages that can be recognized by finite automata.

Examples:

L₁ = {a, aa, aaa, ...}              (one or more a's)
L₂ = {(ab)*}                         (zero or more ab's)
L₃ = {0, 1}*                         (all binary strings)

Context-Free Languages

Languages that can be generated by context-free grammars.

Examples:

L₁ = {aⁿbⁿ | n ≥ 0}                  (equal a's and b's)
L₂ = {ww^R | w ∈ {a,b}*}             (palindromes)

Context-Sensitive Languages

Languages that can be generated by context-sensitive grammars.

Examples:

L₁ = {aⁿbⁿcⁿ | n ≥ 0}                (equal a's, b's, and c's)

Recursively Enumerable Languages

Languages that can be recognized by Turing machines.

Examples:

All computable languages

String Representation

Notation for String Sets

Kleene Star (*): Zero or more repetitions.

a* = {ε, a, aa, aaa, ...}
(ab)* = {ε, ab, abab, ababab, ...}

Kleene Plus (+): One or more repetitions.

a⁺ = {a, aa, aaa, ...}
(ab)⁺ = {ab, abab, ababab, ...}

Optional (?): Zero or one occurrence.

a? = {ε, a}
(ab)? = {ε, ab}

Exponent: Exactly n repetitions.

a³ = {aaa}
(ab)² = {abab}

Range: Between m and n repetitions.

a{2,4} = {aa, aaa, aaaa}

Formal Language Examples

Example 1: Binary Strings

Alphabet: Σ = {0, 1}

Language: All binary strings of length 3

L = {000, 001, 010, 011, 100, 101, 110, 111}
|L| = 8 = 2³

Example 2: Balanced Parentheses

Alphabet: Σ = {(, )}

Language: All balanced parentheses strings

L = {ε, (), (()), ()(), ((())), ...}

Example 3: Arithmetic Expressions

Alphabet: Σ = {0-9, +, -, *, /, (, )}

Language: All valid arithmetic expressions

L = {1, 1+2, (1+2)*3, ...}

Example 4: Programming Language

Alphabet: Keywords, identifiers, operators, etc.

Language: All valid programs in the language

L = {valid Python programs}

Closure Properties

Closure Under Union

If L₁ and L₂ are regular languages, then L₁ ∪ L₂ is regular.

Example:

L₁ = {a*}
L₂ = {b*}
L₁ ∪ L₂ = {a* ∪ b*} (regular)

Closure Under Concatenation

If L₁ and L₂ are regular languages, then L₁ · L₂ is regular.

Example:

L₁ = {a*}
L₂ = {b*}
L₁ · L₂ = {a*b*} (regular)

Closure Under Kleene Star

If L is a regular language, then L* is regular.

Example:

L = {ab}
L* = {(ab)*} (regular)

Glossary

  • Alphabet: Finite set of symbols
  • String: Finite sequence of symbols from an alphabet
  • Empty string (ε): String with no symbols
  • Language: Set of strings over an alphabet
  • Concatenation: Joining strings or languages
  • Kleene star (*): Zero or more repetitions
  • Kleene plus (+): One or more repetitions
  • Prefix: Substring starting at the beginning
  • Suffix: Substring ending at the end
  • Substring: Contiguous sequence within a string
  • Regular language: Language recognizable by finite automata
  • Context-free language: Language generated by context-free grammar
  • Closure property: Property preserved under operations

Practice Problems

Problem 1: Alphabet and Strings

Given Σ = {a, b}:

  1. List all strings of length 2
  2. How many strings of length 3 exist?
  3. What is the length of “aabba”?

Solution:

  1. {aa, ab, ba, bb}
  2. 2³ = 8 strings
  3. |aabba| = 5

Problem 2: Language Operations

Given:

  • L₁ = {a, aa}
  • L₂ = {b, bb}

Find:

  1. L₁ ∪ L₂
  2. L₁ · L₂
  3. L₁*

Solution:

  1. {a, aa, b, bb}
  2. {ab, abb, aab, aabb}
  3. {ε, a, aa, aaa, aaaa, …}

Problem 3: Language Definition

Define the language of all strings over {0, 1} that:

  1. Start with 0
  2. Have equal 0’s and 1’s
  3. Are palindromes

Solution:

  1. L = {0w | w ∈ {0,1}*}
  2. L = {w | w ∈ {0,1}* and #0’s = #1’s}
  3. L = {w | w ∈ {0,1}* and w = w^R}

Problem 4: Kleene Operations

Given L = {a, b}:

  1. What is L²?
  2. What is L⁰?
  3. What is L*?

Solution:

  1. {aa, ab, ba, bb}
  2. {ε}
  3. {ε, a, b, aa, ab, ba, bb, aaa, …}

Online Platforms

Interactive Tools

  • “Introduction to Automata Theory, Languages, and Computation” by Hopcroft, Motwani, Ullman
  • “Formal Languages and Their Relation to Automata” by Hopcroft & Ullman
  • “Elements of the Theory of Computation” by Lewis & Papadimitriou
  • “Compilers: Principles, Techniques, and Tools” by Aho, Lam, Sethi, Ullman
  • “Theory of Computation” by Sipser

Academic Journals

Software Tools

  • JFLAP - Automata and grammar simulator
  • Graphviz - Graph visualization
  • Lex/Yacc - Lexer/parser generators
  • ANTLR - Parser generator
  • Regex - Regular expression tools

Conclusion

Formal languages provide the mathematical foundation for understanding computation:

  • Alphabets define the symbols we work with
  • Strings are sequences of symbols
  • Languages are sets of strings
  • Operations allow us to combine and manipulate languages
  • Classes organize languages by their computational power

Understanding formal languages is essential for studying automata theory, parsing, compilation, and formal verification.

In the next article, we’ll explore context-free grammars, which allow us to define more complex languages.


Next Article: Context-Free Grammars

Comments