Skip to main content
โšก Calmops

SQL Fundamentals: Queries, Schema Design, Relationships & Normalization

Table of Contents

SQL Fundamentals: Queries, Schema Design, Relationships & Normalization

SQL (Structured Query Language) is the foundation of relational database management. Whether you’re building a web application, analyzing data, or managing a business system, understanding SQL fundamentals is essential. This comprehensive guide walks you through the core concepts you need to design effective databases and write efficient queries.

Typical database architecture in web applications

Understanding where SQL and databases fit in a system helps contextualize why these concepts matter:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Client Layer (Browser/Mobile App)                               โ”‚
โ”‚  - Renders UI, handles user interactions                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚ HTTP/REST API Calls
                       โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Application Server (Node.js, Python, Java, etc.)               โ”‚
โ”‚  - Processes business logic                                     โ”‚
โ”‚  - Validates input                                             โ”‚
โ”‚  - Constructs SQL queries                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚ SQL Commands
                       โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Database Server (PostgreSQL, MySQL, SQLite, etc.)              โ”‚
โ”‚  - Stores data in tables                                       โ”‚
โ”‚  - Executes SQL queries                                        โ”‚
โ”‚  - Enforces constraints and relationships                      โ”‚
โ”‚  - Returns result sets                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚ Result Sets (JSON/serialized)
                       โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Application Server (formats response)                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚ HTTP Response
                       โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Client Layer (displays data)                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

In this flow, the database receives SQL queries from the application server and returns data. Good SQL skills mean faster queries, better data integrity, and more reliable applications.

Table of contents

  • Introduction & core concepts
    • What is SQL and relational databases?
    • Core terms & abbreviations
    • Typical database architecture
  • Part 1: SQL Queries โ€” Selecting, filtering, aggregating, and joining data
    • SELECT statements
    • WHERE clauses and filtering
    • ORDER BY sorting
    • LIMIT and OFFSET
    • Aggregation functions
    • GROUP BY and HAVING
    • JOINs (INNER, LEFT, RIGHT, FULL)
  • Part 2: Schema Design โ€” Creating tables, data types, and constraints
    • Creating tables
    • Common data types
    • Constraints (PRIMARY KEY, NOT NULL, UNIQUE, CHECK, DEFAULT, FOREIGN KEY)
    • Naming conventions
  • Part 3: Relationships โ€” Implementing one-to-one, one-to-many, and many-to-many
    • One-to-many relationships
    • One-to-one relationships
    • Many-to-many relationships with junction tables
    • Foreign key constraints and cascading actions
  • Part 4: Normalization โ€” Structuring data to eliminate redundancy
    • What is normalization?
    • First Normal Form (1NF)
    • Second Normal Form (2NF)
    • Third Normal Form (3NF)
    • Denormalization trade-offs
    • Real-world e-commerce schema example
  • Best practices and conclusion
  • Common pitfalls & how to avoid them
  • SQL vs NoSQL: Pros and cons
  • Alternative databases & technologies
  • Complete SQL resources & learning materials
  • Further reading & resources

Introduction & core concepts

What is SQL and relational databases?

SQL (Structured Query Language) is a standardized programming language designed to manage and manipulate relational databases. It allows you to create, read, update, and delete dataโ€”often abbreviated as CRUD (Create, Read, Update, Delete).

A relational database stores data in tables (rows and columns), organized to minimize redundancy and ensure data integrity. Each table represents an entity (e.g., Users, Orders, Products), and relationships between tables enable you to model complex real-world systems. Unlike non-relational databases (NoSQL), relational databases enforce strict schemas and data types.

Key benefits of SQL databases:

  • Data integrity through constraints and relationships
  • Efficient querying across related data
  • ACID compliance (Atomicity, Consistency, Isolation, Durability)
  • Standardized language across different database systems
  • Proven scalability for structured data

Core terms & abbreviations

  • RDBMS โ€” Relational Database Management System (e.g., PostgreSQL, MySQL, Oracle, SQL Server)
  • SQL โ€” Structured Query Language: standard language for database operations
  • Table: A collection of rows (records) and columns (attributes). Represents an entity type.
  • Row/Record: A single entry in a table containing data about one instance of an entity.
  • Column/Field: An attribute of an entity with a specific data type and constraints. Analogous to a property.
  • Schema: The complete structure of a database, including table definitions, column types, constraints, and relationships.
  • Primary Key (PK): A unique identifier for each row in a table. No two rows can have the same primary key value.
  • Foreign Key (FK): A column that references the primary key of another table, creating a relationship and enforcing referential integrity.
  • Index: A data structure that speeds up query lookups on specific columns (trade-off: faster reads, slower writes).
  • Query: A request for data from the database using SQL statements (SELECT, INSERT, UPDATE, DELETE).
  • Constraint: A rule enforced on a column or table to ensure data validity (e.g., NOT NULL, UNIQUE, CHECK).
  • ACID โ€” Atomicity, Consistency, Isolation, Durability: guarantees that transactions are reliable.
  • CRUD โ€” Create, Read, Update, Delete: the four basic data operations.
  • DML โ€” Data Manipulation Language: SQL statements that operate on data (SELECT, INSERT, UPDATE, DELETE).
  • DDL โ€” Data Definition Language: SQL statements that define structure (CREATE, ALTER, DROP).
  • DCL โ€” Data Control Language: SQL statements that control access (GRANT, REVOKE).
  • Cardinality: The relationship type between tables (one-to-one, one-to-many, many-to-many).
  • Normalization: The process of organizing data to eliminate redundancy and dependency anomalies.
  • Denormalization: Intentionally adding redundant data to improve query performance (trade-off: harder to maintain consistency).

Part 1: SQL Queries โ€” Selecting, filtering, aggregating, and joining

SQL queries allow you to retrieve, filter, sort, and aggregate data. Let’s explore the key operations.

SELECT statements and basic queries

The SELECT statement retrieves data from a table. Here’s the basic syntax:

SELECT column1, column2, ...
FROM table_name;

Example: Retrieve all users’ names and email addresses:

SELECT first_name, last_name, email
FROM users;

To get all columns, use the asterisk (*):

SELECT *
FROM users;

Output (sample data):

first_name | last_name | email
-----------|-----------|------------------
Alice      | Johnson   | [email protected]
Bob        | Smith     | [email protected]
Carol      | Williams  | [email protected]

Filtering with WHERE clauses

The WHERE clause filters rows based on conditions. It’s essential for narrowing results to only the data you need.

Syntax:

SELECT column1, column2, ...
FROM table_name
WHERE condition;

What WHERE does: It evaluates each row and includes only those where the condition is TRUE. Rows where the condition is FALSE or NULL are excluded.

Example 1: Find all orders placed in the last 30 days:

SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

Example 2: Find all customers from a specific country:

SELECT customer_id, first_name, last_name, email, country
FROM customers
WHERE country = 'United States' AND active = TRUE;

Example 3: Find products within a price range:

SELECT product_id, product_name, price
FROM products
WHERE price BETWEEN 50 AND 200
ORDER BY price ASC;

Common comparison operators:

  • = โ€” equal
  • != or <> โ€” not equal
  • > โ€” greater than
  • < โ€” less than
  • >= โ€” greater than or equal
  • <= โ€” less than or equal
  • IN (value1, value2, ...) โ€” matches any value in a list
  • BETWEEN value1 AND value2 โ€” within a range (inclusive)
  • LIKE pattern โ€” pattern matching (use % for wildcards, _ for single character)
  • IS NULL or IS NOT NULL โ€” null checks
  • AND โ€” both conditions true
  • OR โ€” at least one condition true
  • NOT โ€” negates a condition

Pattern matching example:

-- Find customers whose email contains 'gmail'
SELECT * FROM customers WHERE email LIKE '%gmail%';

-- Find products starting with 'Wireless'
SELECT * FROM products WHERE product_name LIKE 'Wireless%';

-- Find phone numbers with specific pattern (XXX-XXX-XXXX)
SELECT * FROM users WHERE phone_number LIKE '___-___-____';

Complex WHERE with AND/OR:

-- Find high-value orders from recent customers
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE (total_amount > 1000 OR customer_id IN (1, 5, 10))
  AND order_date >= '2025-01-01'
  AND order_status != 'cancelled';

Example with multiple conditions:

SELECT *
FROM orders
WHERE customer_id = 5
  AND total_amount > 100
  AND order_status = 'completed';

Sorting with ORDER BY

The ORDER BY clause sorts results in ascending (default) or descending order.

Syntax:

SELECT *
FROM table_name
ORDER BY column1 ASC, column2 DESC;

Example: List products by price (highest first), then by name:

SELECT product_id, product_name, price, stock_quantity
FROM products
ORDER BY price DESC, product_name ASC;

Limiting results with LIMIT and OFFSET

To retrieve a subset of rows (useful for pagination):

SELECT *
FROM users
ORDER BY created_at DESC
LIMIT 10
OFFSET 20;

This skips the first 20 rows and returns the next 10 (rows 21-30).

Aggregation functions

Aggregation functions compute a single result from multiple rows: COUNT, SUM, AVG, MAX, MIN.

Example: Get summary statistics for orders:

SELECT
  COUNT(*) AS total_orders,
  SUM(total_amount) AS total_revenue,
  AVG(total_amount) AS average_order_value,
  MAX(total_amount) AS largest_order,
  MIN(total_amount) AS smallest_order
FROM orders
WHERE order_date >= '2025-01-01';

Output (sample):

total_orders | total_revenue | average_order_value | largest_order | smallest_order
-------------|---------------|---------------------|---------------|----------------
1250         | 125000.00     | 100.00              | 5000.00       | 10.50

Grouping with GROUP BY

The GROUP BY clause groups rows by one or more columns, commonly used with aggregation functions.

Syntax:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Example: Calculate total sales by product:

SELECT
  product_id,
  product_name,
  COUNT(*) AS units_sold,
  SUM(quantity) AS total_quantity,
  AVG(price) AS average_price
FROM order_items
GROUP BY product_id, product_name
ORDER BY units_sold DESC;

Filtering grouped results with HAVING:

To filter groups after aggregation, use the HAVING clause (not WHERE):

SELECT
  customer_id,
  COUNT(*) AS order_count,
  SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 1000
ORDER BY total_spent DESC;

This retrieves only customers who have spent more than $1000.

Joining tables

Joins combine rows from two or more tables based on a related column. Mastering joins is critical for working with relational data.

INNER JOIN

Returns only rows where the join condition is true (rows that exist in both tables).

Syntax:

SELECT columns
FROM table1
INNER JOIN table2
ON table1.foreign_key = table2.primary_key;

Example: Retrieve orders with customer information:

SELECT
  o.order_id,
  o.order_date,
  o.total_amount,
  c.first_name,
  c.last_name,
  c.email
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id;

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table, and matching rows from the right table. Non-matching rows show NULL for right table columns.

Example: List all customers and their orders (include customers with no orders):

SELECT
  c.customer_id,
  c.first_name,
  c.last_name,
  COUNT(o.order_id) AS order_count,
  COALESCE(SUM(o.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC;

RIGHT JOIN (RIGHT OUTER JOIN)

Returns all rows from the right table, and matching rows from the left table. Less commonly used than LEFT JOIN.

FULL OUTER JOIN

Returns all rows from both tables. Non-matching rows show NULL for missing values. (Not supported in all databases like MySQL; use UNION instead.)

Example (PostgreSQL, SQLite):

SELECT
  c.customer_id,
  c.first_name,
  o.order_id,
  o.order_date
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id, o.order_date;

Multiple joins

Join three or more tables by chaining joins:

SELECT
  o.order_id,
  o.order_date,
  c.first_name,
  c.last_name,
  oi.product_id,
  p.product_name,
  oi.quantity,
  oi.price
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
INNER JOIN products p
ON oi.product_id = p.product_id
ORDER BY o.order_id, oi.line_item_id;

Part 2: Schema Design โ€” Creating tables, data types, and constraints

A well-designed schema is the foundation of a healthy database. It ensures data integrity, efficiency, and maintainability.

Creating tables

The CREATE TABLE statement defines a table’s structure, including columns, data types, and constraints.

Syntax:

CREATE TABLE table_name (
  column1 data_type constraints,
  column2 data_type constraints,
  ...
);

Example: Create a customers table:

CREATE TABLE customers (
  customer_id INT PRIMARY KEY AUTO_INCREMENT,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL,
  phone VARCHAR(15),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Common data types

Choosing the correct data type is essential for efficiency and data integrity:

  • INT / BIGINT: Integer numbers (INT: up to ~2 billion; BIGINT: larger)
  • DECIMAL(precision, scale) / NUMERIC: Fixed-point numbers (e.g., DECIMAL(10,2) for currency: $99999.99)
  • FLOAT / DOUBLE: Floating-point numbers (less precise for financial data)
  • VARCHAR(length): Variable-length text (e.g., VARCHAR(100))
  • CHAR(length): Fixed-length text (use for codes like country or phone)
  • TEXT: Long text (no length limit in most databases)
  • DATE: Dates (YYYY-MM-DD)
  • TIME: Time (HH:MM:SS)
  • TIMESTAMP / DATETIME: Date and time combined
  • BOOLEAN: TRUE or FALSE

Rule of thumb: Use the smallest data type that fits your data. VARCHAR(50) is more efficient than VARCHAR(255) if 50 is sufficient.

Constraints

Constraints enforce data integrity and business rules:

PRIMARY KEY

Uniquely identifies each row. Each table should have exactly one primary key.

CREATE TABLE products (
  product_id INT PRIMARY KEY AUTO_INCREMENT,
  product_name VARCHAR(100) NOT NULL,
  price DECIMAL(10, 2) NOT NULL
);

NOT NULL

Ensures a column always has a value:

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  total_amount DECIMAL(10, 2) NOT NULL
);

UNIQUE

Ensures all values in a column are unique (no duplicates):

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(50) UNIQUE NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL
);

CHECK

Enforces a condition on column values:

CREATE TABLE products (
  product_id INT PRIMARY KEY,
  product_name VARCHAR(100) NOT NULL,
  price DECIMAL(10, 2) CHECK (price > 0),
  stock_quantity INT CHECK (stock_quantity >= 0)
);

DEFAULT

Sets a default value for a column if none is provided:

CREATE TABLE blog_posts (
  post_id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(200) NOT NULL,
  content TEXT,
  status VARCHAR(20) DEFAULT 'draft',
  published_at TIMESTAMP DEFAULT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

FOREIGN KEY

Links a column to the primary key of another table (see Relationships section for details):

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Naming conventions

  • Table names: Plural, lowercase with underscores (e.g., customers, order_items)
  • Column names: Singular, lowercase with underscores (e.g., customer_id, created_at)
  • Primary keys: Typically {table_name}_id (e.g., customer_id, product_id)
  • Foreign keys: Same format as primary keys (e.g., customer_id in orders references customers)
  • Boolean columns: Prefix with is_ or has_ (e.g., is_active, has_discount)
  • Timestamp columns: Use created_at, updated_at, deleted_at (soft deletes)

Part 3: Relationships โ€” Implementing one-to-one, one-to-many, and many-to-many

Relationships define how data in different tables connects. Understanding and designing relationships correctly is crucial for relational databases.

One-to-many relationships

The most common relationship type. One row in a parent table relates to many rows in a child table.

Example: One customer has many orders.

CREATE TABLE customers (
  customer_id INT PRIMARY KEY AUTO_INCREMENT,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  total_amount DECIMAL(10, 2) NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Query: Find all orders for a specific customer:

SELECT o.order_id, o.order_date, o.total_amount
FROM orders o
WHERE o.customer_id = 1;

Referential integrity: The database enforces that customer_id in orders must exist in customers. If you try to insert an order with a non-existent customer_id, the database rejects it.

One-to-one relationships

One row in a table relates to exactly one row in another table. Less common than one-to-many.

Example: One user has one profile.

CREATE TABLE users (
  user_id INT PRIMARY KEY AUTO_INCREMENT,
  username VARCHAR(50) UNIQUE NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE user_profiles (
  profile_id INT PRIMARY KEY AUTO_INCREMENT,
  user_id INT UNIQUE NOT NULL,
  bio TEXT,
  avatar_url VARCHAR(255),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

The UNIQUE constraint on user_id ensures each user has at most one profile.

Query: Get a user and their profile:

SELECT u.username, u.email, p.bio, p.avatar_url
FROM users u
LEFT JOIN user_profiles p
ON u.user_id = p.user_id
WHERE u.user_id = 1;

Many-to-many relationships

Multiple rows in one table relate to multiple rows in another. Requires a junction table (also called a join table or pivot table) with foreign keys to both sides.

Example: Students enroll in many courses; courses have many students.

CREATE TABLE students (
  student_id INT PRIMARY KEY AUTO_INCREMENT,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE courses (
  course_id INT PRIMARY KEY AUTO_INCREMENT,
  course_name VARCHAR(100) NOT NULL,
  instructor VARCHAR(100) NOT NULL,
  credits INT NOT NULL
);

-- Junction table
CREATE TABLE enrollments (
  enrollment_id INT PRIMARY KEY AUTO_INCREMENT,
  student_id INT NOT NULL,
  course_id INT NOT NULL,
  enrollment_date DATE NOT NULL,
  grade VARCHAR(2),
  FOREIGN KEY (student_id) REFERENCES students(student_id),
  FOREIGN KEY (course_id) REFERENCES courses(course_id),
  UNIQUE (student_id, course_id) -- Prevent duplicate enrollments
);

Query: Find all courses a student is enrolled in:

SELECT c.course_id, c.course_name, c.instructor, e.enrollment_date, e.grade
FROM students s
INNER JOIN enrollments e
ON s.student_id = e.student_id
INNER JOIN courses c
ON e.course_id = c.course_id
WHERE s.student_id = 5
ORDER BY c.course_name;

Query: Find all students in a specific course:

SELECT s.student_id, s.first_name, s.last_name, s.email, e.enrollment_date, e.grade
FROM courses c
INNER JOIN enrollments e
ON c.course_id = e.course_id
INNER JOIN students s
ON e.student_id = s.student_id
WHERE c.course_id = 10
ORDER BY s.last_name, s.first_name;

Foreign key constraints and cascading actions

By default, foreign keys prevent deletion of parent records if child records exist. You can configure this behavior:

  • RESTRICT (default): Prevent deletion of a parent if children exist.
  • CASCADE: Automatically delete child records when the parent is deleted.
  • SET NULL: Set the foreign key to NULL when the parent is deleted (child is orphaned).

Example with cascading delete:

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ON DELETE CASCADE
);

If a customer is deleted, all their orders are automatically deleted too. Use carefully to avoid unintended data loss.


Part 4: Normalization โ€” Structuring data to eliminate redundancy

Normalization is a process of organizing data to eliminate redundancy and improve data integrity. It involves progressively stricter normal forms.

What is normalization?

Normalization reduces data duplication, prevents anomalies (update, insert, delete), and ensures efficient storage. The trade-off: normalized databases may require more complex queries (more joins).

First Normal Form (1NF)

Definition: All column values must be atomic (indivisible). No repeating groups.

Violation example: Storing multiple phone numbers in one column:

customer_id | name  | phone_numbers
------------|-------|---------------------------
1           | Alice | 555-1234, 555-5678

Fix (1NF): Create a separate phone_numbers table:

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100) NOT NULL
);

CREATE TABLE phone_numbers (
  phone_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  phone_number VARCHAR(15) NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Now each phone number is a separate row (atomic values).

Second Normal Form (2NF)

Definition: Must satisfy 1NF AND all non-key columns depend entirely on the primary key (no partial dependencies).

Violation example: Storing employee and department info together with employee as the primary key:

employee_id | employee_name | department_id | department_name
------------|---------------|---------------|----------------
1           | Alice         | 10            | Engineering
2           | Bob           | 10            | Engineering
3           | Carol         | 20            | Sales

Here, department_name depends on department_id, not the full primary key. If multiple employees are in the same department, the department name is duplicated.

Fix (2NF): Separate into two tables:

CREATE TABLE employees (
  employee_id INT PRIMARY KEY,
  employee_name VARCHAR(100) NOT NULL,
  department_id INT NOT NULL,
  FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

CREATE TABLE departments (
  department_id INT PRIMARY KEY,
  department_name VARCHAR(100) NOT NULL,
  location VARCHAR(100)
);

Now department_name is stored once per department, not duplicated for each employee.

Third Normal Form (3NF)

Definition: Must satisfy 2NF AND all non-key columns depend only on the primary key, not on other non-key columns (no transitive dependencies).

Violation example: Storing employee information including department salary ranges:

employee_id | employee_name | salary | department_id | min_salary | max_salary
------------|---------------|--------|---------------|------------|----------
1           | Alice         | 80000  | 10            | 70000      | 100000
2           | Bob           | 75000  | 10            | 70000      | 100000

Here, min_salary and max_salary depend on department_id, not the primary key employee_id. This is a transitive dependency (employee_id โ†’ department_id โ†’ salary_range).

Fix (3NF): Create a salary_ranges table:

CREATE TABLE salary_ranges (
  department_id INT PRIMARY KEY,
  min_salary DECIMAL(10, 2) NOT NULL,
  max_salary DECIMAL(10, 2) NOT NULL,
  FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

CREATE TABLE employees (
  employee_id INT PRIMARY KEY,
  employee_name VARCHAR(100) NOT NULL,
  salary DECIMAL(10, 2) NOT NULL,
  department_id INT NOT NULL,
  FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

Denormalization: when to break the rules

Normalization is ideal but sometimes comes at a performance cost (many joins). Denormalization is intentionally adding redundant data to improve query performance.

When to denormalize:

  • High-traffic read queries that require expensive joins.
  • Reporting or analytics databases where data is mostly read-only.
  • Real-time dashboards where query speed is critical.

Example of denormalization: Instead of joining orders and order_items every time, store the order total in the orders table (even though it can be calculated):

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  total_amount DECIMAL(10, 2) NOT NULL, -- Denormalized (sum of order_items)
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Trade-off: You save join time but must ensure total_amount stays in sync with order_items (usually via triggers or application logic).

Real-world example: E-Commerce Schema

Here’s a normalized e-commerce database:

CREATE TABLE customers (
  customer_id INT PRIMARY KEY AUTO_INCREMENT,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT NOT NULL,
  order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  status VARCHAR(20) DEFAULT 'pending',
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE products (
  product_id INT PRIMARY KEY AUTO_INCREMENT,
  product_name VARCHAR(100) NOT NULL,
  price DECIMAL(10, 2) NOT NULL CHECK (price > 0),
  stock_quantity INT NOT NULL CHECK (stock_quantity >= 0),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
  order_item_id INT PRIMARY KEY AUTO_INCREMENT,
  order_id INT NOT NULL,
  product_id INT NOT NULL,
  quantity INT NOT NULL CHECK (quantity > 0),
  unit_price DECIMAL(10, 2) NOT NULL,
  FOREIGN KEY (order_id) REFERENCES orders(order_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

CREATE TABLE categories (
  category_id INT PRIMARY KEY AUTO_INCREMENT,
  category_name VARCHAR(100) NOT NULL UNIQUE
);

CREATE TABLE product_categories (
  product_id INT NOT NULL,
  category_id INT NOT NULL,
  PRIMARY KEY (product_id, category_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id),
  FOREIGN KEY (category_id) REFERENCES categories(category_id)
);

Benefits of this schema:

  • No data duplication: product names and prices stored once.
  • Referential integrity: orders can’t reference non-existent customers.
  • Flexibility: products can belong to multiple categories via the junction table.

Best practices and conclusion

SQL best practices

  1. Use meaningful names: Table and column names should be self-documenting.
  2. Index strategically: Add indexes to frequently queried columns (foreign keys, date ranges) but avoid over-indexing.
  3. Validate at the database level: Use constraints (NOT NULL, UNIQUE, CHECK) to enforce business rules.
  4. Use transactions for related operations: Wrap multiple statements in a transaction to ensure atomicity.
  5. **Avoid SELECT ***: Specify only the columns you need to reduce data transfer.
  6. Use parameterized queries: Prevent SQL injection by using parameterized statements (prepared statements).
  7. Monitor query performance: Use EXPLAIN to analyze slow queries.
  8. Plan for growth: Design schemas with future scaling in mind.

Conclusion

SQL fundamentalsโ€”queries, schema design, relationships, and normalizationโ€”are the building blocks of effective database work. By mastering these concepts, you can:

  • Write efficient queries that retrieve exactly the data you need.
  • Design schemas that prevent data anomalies and maintain integrity.
  • Model complex real-world systems using relationships.
  • Normalize data to eliminate redundancy while balancing performance needs.

As you grow in your database career, you’ll refine these skills and learn advanced topics like indexing, query optimization, and distributed databases. But these fundamentals will always be your foundation. Start small, practice often, and build confidence one query at a time.

Next steps:

  • Practice writing queries against your own databases.
  • Design a small schema from scratch (e.g., a blog, e-commerce store, or task manager).
  • Learn your specific database system (PostgreSQL, MySQL, SQLite) and its extensions.
  • Explore advanced topics: indexing, query optimization, transactions, window functions.

Happy querying! ๐Ÿš€


Common pitfalls & how to avoid them โš ๏ธ

1. N+1 Query Problem

Pitfall: Querying data in a loop, causing one query for the parent and N additional queries for children.

Example (Bad):

// In application code (pseudocode)
const customers = SELECT * FROM customers;
for (let customer of customers) {
  customer.orders = SELECT * FROM orders WHERE customer_id = customer.id;
}

This executes 1 + N queries (expensive!).

Fix: Use a JOIN to fetch all data in one query.

SELECT c.*, o.*
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id;

Prevention: Always think about relationships before writing queries. Use JOINs to fetch related data efficiently.

2. SQL Injection Vulnerabilities

Pitfall: Building SQL strings with user input without sanitization.

Example (Bad):

const userId = req.params.id;
const query = `SELECT * FROM users WHERE id = ${userId};`; // DANGEROUS!
// If userId = "1; DROP TABLE users;--" the table gets deleted!

Fix: Use parameterized queries (prepared statements).

// Good approach: parameterized query
const query = `SELECT * FROM users WHERE id = ?`;
db.execute(query, [userId]); // Input is safely escaped

Prevention: ALWAYS use parameterized queries when including user input. Never concatenate strings into SQL.

3. Missing Indexes on Foreign Keys

Pitfall: Querying on columns without indexes, causing full table scans.

Example: Querying orders WHERE customer_id = 5 without an index on customer_id scans the entire orders table.

Fix: Add indexes to foreign keys and frequently queried columns.

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_order_date ON orders(order_date);

Prevention: Identify hot queries (frequently run), check if indexes exist, and add them strategically.

4. Selecting More Data Than Needed

Pitfall: Using SELECT * instead of specific columns wastes bandwidth and memory.

-- Bad: retrieves all columns
SELECT * FROM users;

-- Good: fetch only needed columns
SELECT user_id, first_name, last_name, email FROM users;

Prevention: Always specify only the columns you need.

5. Ignoring NULL Handling

Pitfall: Not accounting for NULL values in queries, leading to unexpected results.

-- NULL != NULL, so this query won't include NULL values
SELECT * FROM users WHERE phone_number != '555-1234';

Fix: Explicitly check for NULL.

SELECT * FROM users
WHERE phone_number != '555-1234' OR phone_number IS NULL;

Prevention: Always test queries with NULL values. Use COALESCE to provide defaults.

6. Forgetting to Add Constraints

Pitfall: Allowing invalid data (negative quantities, duplicate emails, etc.) into the database.

Fix: Use constraints at the database level.

CREATE TABLE products (
  product_id INT PRIMARY KEY,
  price DECIMAL(10, 2) CHECK (price > 0), -- Prevents negative prices
  stock INT CHECK (stock >= 0)
);

CREATE TABLE users (
  email VARCHAR(100) UNIQUE NOT NULL -- Prevents duplicate/null emails
);

Prevention: Define constraints early in schema design. Validate at both application and database levels.

7. Denormalization Without Clear Purpose

Pitfall: Adding redundant data (like storing order_total when it can be calculated) without a performance reason.

Issue: Keeping data in sync becomes complex and error-prone.

Fix: Only denormalize when you’ve measured that a critical query is too slow. Add triggers or application logic to maintain consistency.

Prevention: Start normalized. Only denormalize if profiling shows a genuine bottleneck.

8. Not Understanding Transaction Isolation

Pitfall: Assuming transactions prevent all concurrent access issues.

Example: Two users simultaneously deducting from an account balance, both seeing the same starting value.

Fix: Use appropriate isolation levels and understand the trade-offs.

Prevention: Learn about transaction isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE) and choose appropriately for your use case.


SQL vs NoSQL: Pros and cons โš–๏ธ

SQL (Relational) databases

Pros:

  • Data integrity: Constraints, foreign keys, and transactions ensure reliable data.
  • Complex queries: JOINs allow querying across related data with precision.
  • ACID compliance: Atomicity, Consistency, Isolation, Durability guarantees reliability.
  • Standardized: SQL works across PostgreSQL, MySQL, SQL Server, Oracle, etc.
  • Mature & stable: Decades of refinement; proven at scale.
  • Good for structured data: Excels with well-defined schemas.

Cons:

  • Schema rigidity: Changing structure requires migrations; harder to evolve.
  • Vertical scaling focus: Scaling typically means bigger hardware; horizontal scaling is complex.
  • Over-normalization overhead: Excessive normalization can require many JOINs, slowing queries.
  • Not ideal for unstructured data: Storing documents, images, or deeply nested objects is awkward.

NoSQL databases (MongoDB, Cassandra, Redis, DynamoDB, etc.)

Pros:

  • Schema flexibility: Store documents with varying structures; evolve easily.
  • Horizontal scaling: Designed to scale across multiple machines.
  • Fast writes: Optimized for high-velocity data ingestion.
  • Developer-friendly: JSON documents map naturally to application objects.
  • Good for unstructured/semi-structured data: Storing documents, logs, events.

Cons:

  • No ACID guarantees (traditionally): Data consistency harder to enforce; recent versions (MongoDB 4.0+) added transactions.
  • Complex queries: Joining across collections is expensive or impossible; requires denormalization.
  • Data duplication: Redundancy common; harder to maintain consistency.
  • Less mature for transactions: Doesn’t provide the same reliability guarantees as SQL for complex operations.
  • Steeper learning curve: Different paradigms per database type (document, key-value, graph, etc.).

When to use SQL

  • Financial systems, booking systems, inventory management (data integrity critical)
  • Complex multi-table queries and reports
  • Structured, relational data with clear schemas
  • ACID transaction requirements

When to use NoSQL

  • Real-time analytics, IoT sensor data, user event logs (high write throughput)
  • Unstructured content (documents, media, JSON data)
  • Rapid schema evolution and prototyping
  • Horizontal scaling at massive scale (millions of concurrent users)

Modern approach: Many companies use both. SQL for core transactional data, NoSQL for specific use cases (caching, events, analytics).


Alternative databases & technologies

Time-series databases

For applications tracking metrics over time (stock prices, sensor data, server metrics).

  • Examples: InfluxDB, Prometheus, TimescaleDB
  • Advantage: Optimized for time-based queries and aggregations
  • Link: https://www.influxdata.com/

Graph databases

For highly interconnected data (social networks, recommendations, knowledge graphs).

  • Examples: Neo4j, ArangoDB
  • Advantage: Efficient traversal of relationships
  • Link: https://neo4j.com/

Document databases (NoSQL)

For flexible, schema-less data (documents, catalogs, user profiles).

  • Examples: MongoDB, CouchDB, Firebase Firestore
  • Advantage: Natural JSON representation; schema flexibility
  • Link: https://www.mongodb.com/

Search & analytics engines

For full-text search and analytics on large datasets.

  • Examples: Elasticsearch, Solr, Apache Druid
  • Advantage: Fast text search, analytics queries, dashboard support
  • Link: https://www.elastic.co/

Complete SQL resources & learning materials ๐Ÿ“š

Official documentation

Learning platforms & tutorials

Books

  • “Database Design for Mere Mortals” by Michael J. Hernandez โ€” Best for understanding database design principles
  • “SQL Performance Explained” by Markus Winand โ€” Deep dive into optimization (https://sql-performance-explained.com/)
  • “Learning SQL” by Alan Beaulieu โ€” Beginner-friendly comprehensive guide

Best practices & style guides

Tools & utilities

Advanced topics


Further reading & resources

Comments