SPEAK THE
LANGUAGE OF DATA

// Every application is built on data.

SQL IS THE BACKBONE OF MODERN APPLICATIONS.

Every website, every mobile app, every enterprise system— underneath it all is a database and SQL. Whether you're building a simple blog or a complex analytics platform, understanding SQL gives you the power to harness your data.

WHY SQL MATTERS

SQL (Structured Query Language) is the universal language for relational databases. It works with PostgreSQL, MySQL, Oracle, SQL Server, and countless others. Master SQL once, and you can work with any database in the world.

BECOME DATA FLUENT

Learn to write queries that extract exactly what you need. Understand joins, subqueries, window functions, and transactions. Design schemas that scale. Optimize performance. SQL is not just about getting data—it's about getting it right.

BEGIN YOUR JOURNEY →

// The Path to Data Mastery

12 lessons. Complete SQL control.

LESSON 01

Introduction to SQL

What is SQL? Setting up a database and understanding relational concepts.

Beginner
LESSON 02

Basic Queries

SELECT, WHERE, ORDER BY, and filtering data.

Beginner
LESSON 03

Data Modification

INSERT, UPDATE, DELETE, and managing data.

Beginner
LESSON 04

Table Design

CREATE TABLE, data types, constraints, and schema design.

Beginner
LESSON 05

Joins

INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and multiple tables.

Intermediate
LESSON 06

Aggregation

GROUP BY, HAVING, COUNT, SUM, AVG, MAX, MIN.

Intermediate
LESSON 07

Subqueries

Nested queries, correlated subqueries, and common table expressions.

Intermediate
LESSON 08

Indexes

Creating indexes, understanding execution plans, and query optimization.

Advanced
LESSON 09

Transactions

ACID properties, BEGIN, COMMIT, ROLLBACK, and isolation levels.

Advanced
LESSON 10

Window Functions

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and analytical queries.

Advanced
LESSON 11

Views & Functions

Creating views, stored procedures, functions, and triggers.

Advanced
LESSON 12

Production Patterns

Database design patterns, security, backup, and scaling strategies.

Advanced
← Back to Lessons

// Lesson 1: Introduction to SQL

What is SQL?

SQL (Structured Query Language) is a domain-specific language used to manage and manipulate relational databases. It's the standard language for database systems, allowing you to create, read, update, and delete data, as well as manage database structure.

SQL is declarative—you describe what you want, not how to get it. The database engine figures out the most efficient way to execute your query.

Relational Database Concepts

Tables

A table is a collection of related data entries (rows) with a consistent structure (columns):

+-------------+-------------+-------------+-------------+
| first_name  | last_name   | email       | created_at  |
+-------------+-------------+-------------+-------------+
| John        | Doe         | j@example.com| 2024-01-15 |
| Jane        | Smith       | jane@example| 2024-01-16 |
+-------------+-------------+-------------+-------------+

Columns and Data Types

Each column has a data type that defines what kind of data it can hold:

Primary Keys

A primary key uniquely identifies each row in a table:

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(100)
);

Foreign Keys

A foreign key creates a relationship between two tables:

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Setting Up PostgreSQL

# Install PostgreSQL (Ubuntu)
sudo apt update
sudo apt install postgresql postgresql-contrib

# Start the service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Connect to PostgreSQL
sudo -u postgres psql

# Create a database
CREATE DATABASE myapp;

# Connect to your database
\c myapp

Your First Query

-- Select all rows from a table
SELECT * FROM users;

-- Select specific columns
SELECT first_name, email FROM users;

-- Add a WHERE clause to filter
SELECT * FROM users WHERE id = 1;
Next: Basic Queries →
← Back to Lessons

// Lesson 2: Basic Queries

The SELECT statement is the foundation of SQL. It retrieves data from one or more tables based on specified criteria.

SELECT Fundamentals

-- Select all columns
SELECT * FROM employees;

-- Select specific columns
SELECT first_name, last_name, salary FROM employees;

-- Use aliases for readability
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary * 12 AS "Annual Salary"
FROM employees;

Filtering with WHERE

-- Basic equality
SELECT * FROM products WHERE category = 'Electronics';

-- Numeric comparisons
SELECT * FROM orders WHERE total > 100;
SELECT * FROM employees WHERE years_of_service >= 5;

-- Multiple conditions with AND
SELECT * FROM users 
WHERE status = 'active' 
AND created_at > '2024-01-01';

-- OR condition
SELECT * FROM products 
WHERE category = 'Electronics' 
OR category = 'Books';

-- IN operator
SELECT * FROM users 
WHERE country IN ('USA', 'Canada', 'Mexico');

-- BETWEEN operator (inclusive)
SELECT * FROM orders 
WHERE total BETWEEN 50 AND 100;

-- LIKE for pattern matching
SELECT * FROM users 
WHERE email LIKE '%@gmail.com';

-- Case-insensitive search (PostgreSQL)
SELECT * FROM users 
WHERE name ILIKE 'john%';

Sorting with ORDER BY

-- Ascending order (default)
SELECT * FROM employees ORDER BY salary;

-- Explicit ascending
SELECT * FROM employees ORDER BY salary ASC;

-- Descending order
SELECT * FROM employees ORDER BY salary DESC;

-- Multiple columns
SELECT * FROM employees 
ORDER BY department ASC, salary DESC;

-- Order by expression
SELECT * FROM products 
ORDER BY price * quantity DESC;

Limiting Results

-- Limit number of results
SELECT * FROM users LIMIT 10;

-- Skip first 5, get next 10 (OFFSET)
SELECT * FROM users LIMIT 10 OFFSET 5;

-- PostgreSQL syntax for pagination
SELECT * FROM users 
ORDER BY created_at DESC 
LIMIT 20 OFFSET 40;  -- Page 3, 20 per page

-- FETCH (SQL standard)
SELECT * FROM users 
ORDER BY name 
LIMIT 10;

DISTINCT

-- Get unique values
SELECT DISTINCT country FROM users;

-- Distinct combinations
SELECT DISTINCT category, subcategory 
FROM products;

Column Expressions

-- Arithmetic operations
SELECT 
    product_name,
    price,
    quantity,
    price * quantity AS total
FROM order_items;

-- String concatenation (PostgreSQL)
SELECT 
    first_name || ' ' || last_name AS full_name
FROM users;

-- String functions
SELECT 
    UPPER(name) AS name_upper,
    LOWER(email) AS email_lower,
    LENGTH(name) AS name_length
FROM users;

-- Date functions (PostgreSQL)
SELECT 
    created_at,
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    DATE_TRUNC('day', created_at) AS day
FROM users;
Next: Data Modification →
← Back to Lessons

// Lesson 3: Data Modification

SQL isn't just for reading data—you can also insert, update, and delete data. These operations are collectively known as DML (Data Manipulation Language).

INSERT - Adding Data

-- Insert a single row
INSERT INTO users (first_name, last_name, email)
VALUES ('John', 'Doe', 'john@example.com');

-- Insert multiple rows
INSERT INTO users (first_name, last_name, email) VALUES
    ('Alice', 'Smith', 'alice@example.com'),
    ('Bob', 'Johnson', 'bob@example.com'),
    ('Carol', 'Williams', 'carol@example.com');

-- Insert with default values
INSERT INTO products (name, price) 
VALUES ('Widget', 9.99);
-- created_at will use DEFAULT CURRENT_TIMESTAMP

-- Insert and return the inserted row
INSERT INTO users (first_name, last_name, email)
VALUES ('Dave', 'Brown', 'dave@example.com')
RETURNING id, email, created_at;

-- Insert from another table
INSERT INTO archived_users (id, name, email, archived_at)
SELECT id, name, email, CURRENT_TIMESTAMP
FROM users 
WHERE status = 'inactive';

UPDATE - Modifying Data

-- Update all rows
UPDATE users SET status = 'active';

-- Update with WHERE clause
UPDATE users 
SET email = 'newemail@example.com' 
WHERE id = 1;

-- Update multiple columns
UPDATE employees 
SET 
    salary = salary * 1.1,
    updated_at = CURRENT_TIMESTAMP
WHERE department = 'Engineering';

-- Update based on another table
UPDATE orders o
SET customer_name = u.name
FROM users u
WHERE o.user_id = u.id;

-- Update with subquery
UPDATE products p
SET price = (
    SELECT AVG(price) FROM products WHERE category = p.category
)
WHERE price > (
    SELECT AVG(price) FROM products WHERE category = p.category
);

DELETE - Removing Data

-- Delete with WHERE clause
DELETE FROM users WHERE id = 1;

-- Delete multiple rows
DELETE FROM sessions 
WHERE last_activity < NOW() - INTERVAL '30 days';

-- Delete all rows (careful!)
DELETE FROM log_entries;

-- Delete and return deleted rows
DELETE FROM orders 
WHERE status = 'cancelled'
RETURNING id, total, deleted_at;

UPSERT - Insert or Update

-- PostgreSQL UPSERT (INSERT ... ON CONFLICT)
INSERT INTO users (email, name, updated_at)
VALUES ('john@example.com', 'John Doe', CURRENT_TIMESTAMP)
ON CONFLICT (email) 
DO UPDATE SET 
    name = EXCLUDED.name,
    updated_at = EXCLUDED.updated_at;

-- MySQL UPSERT (ON DUPLICATE KEY UPDATE)
INSERT INTO users (email, name)
VALUES ('john@example.com', 'John Doe')
ON DUPLICATE KEY UPDATE 
    name = VALUES(name);

TRUNCATE - Fast Table Clearing

-- Truncate a table (faster than DELETE)
TRUNCATE TABLE logs;

-- Truncate multiple tables
TRUNCATE TABLE logs, access_logs, error_logs;

-- Restart identity (reset auto-increment)
TRUNCATE TABLE users RESTART IDENTITY;

-- CASCADE to truncate dependent tables
TRUNCATE TABLE orders CASCADE;
Next: Table Design →
← Back to Lessons

// Lesson 4: Table Design

Good database design is crucial for application performance and data integrity. Let's learn how to create tables that scale.

CREATE TABLE

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL UNIQUE,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    bio TEXT,
    avatar_url VARCHAR(500),
    is_active BOOLEAN DEFAULT TRUE,
    role VARCHAR(20) DEFAULT 'user',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Data Types Deep Dive

Numeric Types

-- Small integer (32,768 to 32,767)
smallint

-- Standard integer (-2.1B to 2.1B)
integer

-- Large integer (-9.2Q to 9.2Q)
bigint

-- Exact decimal (precision, scale)
DECIMAL(10,2)  -- 12345678.99
NUMERIC(10,2)

-- Floating point
REAL           -- 6 decimal digits
DOUBLE PRECISION -- 15 decimal digits

String Types

-- Fixed length
CHAR(10)       -- Always 10 chars, padded

-- Variable length with limit
VARCHAR(255)   -- Up to 255 chars

-- Unlimited length
TEXT           -- PostgreSQL, MySQL

-- JSON types (PostgreSQL)
JSON    -- Validated JSON
JSONB   -- Binary, indexed, faster

Date/Time Types

DATE        -- 2024-01-15
TIME        -- 14:30:00
TIMESTAMP   -- 2024-01-15 14:30:00
TIMESTAMPTZ -- 2024-01-15 14:30:00-05:00 (with timezone)
INTERVAL    -- 3 days, 2 hours

Constraints

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    
    -- NOT NULL - Required field
    name VARCHAR(200) NOT NULL,
    
    -- UNIQUE - No duplicates
    sku VARCHAR(50) UNIQUE,
    
    -- CHECK - Custom validation
    price DECIMAL(10,2) CHECK (price >= 0),
    quantity INTEGER CHECK (quantity >= 0),
    
    -- DEFAULT - Automatic value
    status VARCHAR(20) DEFAULT 'active',
    
    -- REFERENCES - Foreign key
    category_id INTEGER REFERENCES categories(id),
    
    -- Composite primary key (order_items table)
    order_id INTEGER,
    product_id INTEGER,
    PRIMARY KEY (order_id, product_id)
);

ALTER TABLE

-- Add a column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Add a constraint
ALTER TABLE users 
ADD CONSTRAINT unique_phone UNIQUE (phone);

-- Drop a column
ALTER TABLE users DROP COLUMN old_field;

-- Rename a table
ALTER TABLE users RENAME TO customers;

-- Rename a column
ALTER TABLE users RENAME COLUMN username TO handle;

Indexes

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (for multi-column queries)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index
CREATE UNIQUE INDEX idx_products_sku ON products(sku);

-- Partial index (only index active users)
CREATE INDEX idx_users_active_email ON users(email) 
WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
Next: Joins →
← Back to Lessons

// Lesson 5: Joins

Joins are how you combine data from multiple tables. They're the heart of relational database design.

Understanding Joins

-- Sample tables
-- users: id, name
-- orders: id, user_id, total

-- INNER JOIN: Only matching rows
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

Visual Representation

Table A         Table B
+----+----+      +----+----+
| id | A  |      | id | B  |
+----+----+      +----+----+
| 1  | a1 |      | 1  | b1 |
| 2  | a2 |      | 3  | b3 |
| 3  | a3 |      | 4  | b4 |
+----+----+      +----+----+

INNER JOIN (A.id = B.id):  {1, a1, b1}
LEFT JOIN (A.id = B.id):   {1, a1, b1}, {2, a2, NULL}, {3, a3, NULL}
RIGHT JOIN (A.id = B.id):  {1, a1, b1}, {NULL, NULL, b3}, {NULL, NULL, b4}
FULL JOIN (A.id = B.id):   {1, a1, b1}, {2, a2, NULL}, {3, a3, NULL}, {NULL, NULL, b3}, {NULL, NULL, b4}

INNER JOIN

-- Basic inner join
SELECT 
    o.id AS order_id,
    u.name AS customer_name,
    u.email,
    o.total,
    o.status
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed';

-- Multiple joins
SELECT 
    o.id,
    u.name,
    p.name AS product,
    oi.quantity,
    oi.price
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;

LEFT JOIN

-- Users with their orders (including users with no orders)
SELECT 
    u.name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

RIGHT JOIN

-- All categories, even if no products (rarely needed, usually flip tables)
SELECT 
    c.name AS category,
    p.name AS product
FROM products p
RIGHT JOIN categories c ON p.category_id = c.id;

FULL OUTER JOIN

-- All users and all products they've ordered (including unmatched)
SELECT 
    u.name AS user,
    p.name AS product
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id
FULL OUTER JOIN order_items oi ON o.id = oi.order_id
FULL OUTER JOIN products p ON oi.product_id = p.id;

Self Joins

-- Employees and their managers
SELECT 
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

-- Find employees in same department
SELECT 
    e1.name AS employee1,
    e2.name AS employee2,
    e1.department
FROM employees e1
JOIN employees e2 ON e1.department = e2.department AND e1.id < e2.id;

Join Syntax Variations

-- ANSI SQL syntax (preferred)
SELECT * FROM a JOIN b ON a.id = b.a_id

-- Implicit join (older syntax, avoid)
SELECT * FROM a, b WHERE a.id = b.a_id

-- NATURAL JOIN (use with caution - joins on same-named columns)
SELECT * FROM orders NATURAL JOIN order_items
Next: Aggregation →
← Back to Lessons

// Lesson 6: Aggregation

Aggregation functions transform sets of rows into single values. They're essential for analytics and reporting.

Basic Aggregation Functions

-- COUNT - Number of rows
SELECT COUNT(*) FROM orders;                    -- All rows
SELECT COUNT(DISTINCT user_id) FROM orders;   -- Unique users

-- SUM - Total of values
SELECT SUM(total) FROM orders;

-- AVG - Average
SELECT AVG(price) FROM products;

-- MIN/MAX - Smallest/Largest
SELECT MIN(created_at), MAX(created_at) FROM users;

GROUP BY

-- Group by single column
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM products
GROUP BY category;

-- Group by multiple columns
SELECT 
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY 
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
ORDER BY year, month;

HAVING

-- WHERE filters rows, HAVING filters groups
SELECT 
    user_id,
    COUNT(*) AS order_count,
    SUM(total) AS total_spent
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 5           -- Users with 5+ orders
AND SUM(total) > 1000;        -- And $1000+ spent

-- Common pattern: categories with more than 10 products
SELECT 
    category_id,
    COUNT(*) AS product_count
FROM products
GROUP BY category_id
HAVING COUNT(*) > 10;

Aggregation with CASE

-- Conditional aggregation
SELECT 
    COUNT(*) AS total_users,
    COUNT(CASE WHEN status = 'active' THEN 1 END) AS active_users,
    COUNT(CASE WHEN is_premium THEN 1 END) AS premium_users,
    AVG(CASE WHEN is_premium THEN 1.0 ELSE 0.0 END) AS premium_percentage
FROM users;

-- Pivot-like aggregation
SELECT 
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
    COUNT(CASE WHEN status = 'processing' THEN 1 END) AS processing,
    COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
    COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered
FROM orders;

String Aggregation

-- PostgreSQL STRING_AGG
SELECT 
    department,
    STRING_AGG(name, ', ' ORDER BY salary DESC) AS employees
FROM employees
GROUP BY department;

-- MySQL GROUP_CONCAT
SELECT 
    department,
    GROUP_CONCAT(name ORDER BY salary DESC SEPARATOR ', ') AS employees
FROM employees
GROUP BY department;

Common Aggregation Patterns

-- Running total
SELECT 
    created_at,
    amount,
    SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM transactions;

-- Percentage of total
SELECT 
    category,
    sales,
    sales * 100.0 / SUM(sales) OVER () AS percentage_of_total
FROM category_sales;

-- Rank within group
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
Next: Subqueries →
← Back to Lessons

// Lesson 7: Subqueries

Subqueries are queries nested inside other queries. They let you use the results of one query in another.

Scalar Subqueries

-- Subquery in SELECT
SELECT 
    name,
    price,
    (SELECT AVG(price) FROM products) AS avg_price
FROM products;

-- Subquery in WHERE
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- Subquery in FROM (derived table)
SELECT category, avg_price
FROM (
    SELECT 
        category,
        AVG(price) AS avg_price
    FROM products
    GROUP BY category
) AS category_avg
WHERE avg_price > 50;

IN with Subquery

-- Find users who have placed orders
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

-- Find users who have NOT placed orders
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM orders);

-- Find products ordered by premium users
SELECT * FROM products
WHERE id IN (
    SELECT product_id FROM order_items
    WHERE order_id IN (
        SELECT id FROM orders
        WHERE user_id IN (
            SELECT id FROM users WHERE is_premium = true
        )
    )
);

EXISTS / NOT EXISTS

-- Users who have placed at least one order
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Categories with no products
SELECT * FROM categories c
WHERE NOT EXISTS (
    SELECT 1 FROM products p WHERE p.category_id = c.id
);

Correlated Subqueries

-- Products with price above their category average
SELECT * FROM products p
WHERE price > (
    SELECT AVG(price) 
    FROM products 
    WHERE category_id = p.category_id
);

-- Each user's latest order
SELECT * FROM orders o
WHERE created_at = (
    SELECT MAX(created_at) 
    FROM orders 
    WHERE user_id = o.user_id
);

-- Employees earning more than their department average
SELECT * FROM employees e
WHERE salary > (
    SELECT AVG(salary) 
    FROM employees 
    WHERE department = e.department
);

Common Table Expressions (CTEs)

-- CTE (PostgreSQL, SQL Server, MySQL 8+)
WITH 
    active_orders AS (
        SELECT * FROM orders WHERE status = 'active'
    ),
    premium_users AS (
        SELECT id FROM users WHERE is_premium = true
    )
SELECT 
    u.name,
    COUNT(o.id) AS order_count
FROM premium_users u
LEFT JOIN active_orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- Recursive CTE (for hierarchical data)
WITH RECURSIVE org_chart AS (
    -- Base case: top-level employees
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees who report to someone in the org
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;

LATERAL Subqueries

-- PostgreSQL LATERAL - subquery can reference previous tables
SELECT 
    p.name,
    p.price,
    o.order_count
FROM products p
CROSS JOIN LATERAL (
    SELECT COUNT(*) AS order_count
    FROM order_items
    WHERE product_id = p.id
) AS o
ORDER BY o.order_count DESC;
Next: Indexes →
← Back to Lessons

// Lesson 8: Indexes

Indexes are the key to database performance. They allow the database to find data quickly without scanning every row.

How Indexes Work

Think of an index like a book index. Instead of reading every page to find a topic, you check the index and go directly to the right page. However, indexes have a cost—they take space and slow down writes (INSERT, UPDATE, DELETE).

Creating Indexes

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (left-to-right order matters!)
CREATE INDEX idx_orders_user_status 
ON orders(user_id, status);

-- Unique index
CREATE UNIQUE INDEX idx_products_sku 
ON products(sku);

-- Partial index
CREATE INDEX idx_orders_active 
ON orders(user_id) 
WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_lower_email 
ON users(LOWER(email));

Index Types

B-tree (default)

-- Most common, good for equality and range queries
CREATE INDEX idx_orders_date ON orders(created_at);
CREATE INDEX idx_products_price ON products(price);

Hash

-- Only for equality comparisons
CREATE INDEX idx_users_email_hash ON users USING HASH(email);

GIN (Generalized Inverted Index)

-- For JSON, arrays, full-text search
CREATE INDEX idx_products_tags ON products USING GIN(tags);
CREATE INDEX idx_products_attributes ON products USING GIN(attributes);

BRIN (Block Range Index)

-- For very large, naturally ordered data
CREATE INDEX idx_logs_time ON logs USING BRIN(created_at);

Understanding Query Plans

-- EXPLAIN shows the query plan
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- ANALYZE actually runs the query and shows timing
EXPLAIN ANALYZE 
SELECT * FROM users WHERE email = 'john@example.com';

-- Example output:
-- Index Scan using idx_users_email on users
-- (cost=0.43..8.45 rows=1 width=104)
-- (actual time=0.021..0.022 rows=1 loops=1)
-- Index Cond: (email = 'john@example.com'::text)

Index Design Guidelines

-- Good: Composite for specific query pattern
CREATE INDEX idx_orders_user_status 
ON orders(user_id, status);

-- For queries like:
-- WHERE user_id = 5 AND status = 'active'
-- WHERE user_id = 5
-- (can't use second column alone!)

Index Maintenance

-- Reindex (rebuild fragmented index)
REINDEX INDEX idx_users_email;

-- Check index usage (PostgreSQL)
SELECT 
    indexrelname,
    idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'users';

-- Remove unused indexes
DROP INDEX idx_users_unused;
Next: Transactions →
← Back to Lessons

// Lesson 9: Transactions

Transactions ensure data integrity when multiple operations must succeed or fail together. They're essential for financial transactions, inventory management, and any operation where consistency matters.

ACID Properties

Basic Transaction

-- Start transaction
BEGIN;

-- Multiple operations
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE user_id = 2;

-- Commit (make permanent)
COMMIT;

-- Or rollback (undo everything)
ROLLBACK;

Savepoints

BEGIN;

INSERT INTO orders (user_id, total) VALUES (1, 100);
SAVEPOINT order_inserted;

-- Something fails, but we want to keep the order
ROLLBACK TO SAVEPOINT order_inserted;

-- Continue with other operations
INSERT INTO log (message) VALUES ('Order processing');
COMMIT;

Transaction Isolation Levels

-- Set isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Or at start of transaction
BEGIN ISOLATION LEVEL REPEATABLE READ;

Read Phenomena

Isolation Levels

LevelDirty ReadNon-repeatablePhantom
READ UNCOMMITTEDPossiblePossiblePossible
READ COMMITTEDNotPossiblePossible
REPEATABLE READNotNotPossible
SERIALIZABLENotNotNot

Deadlocks

-- Two transactions waiting for each other
-- Transaction 1: UPDATE accounts WHERE id = 1
-- Transaction 2: UPDATE accounts WHERE id = 2
-- Transaction 1: UPDATE accounts WHERE id = 2 (waits)
-- Transaction 2: UPDATE accounts WHERE id = 1 (DEADLOCK)

-- Solution: Always access tables in same order
BEGIN;
UPDATE accounts WHERE id = 1;
UPDATE accounts WHERE id = 2;
COMMIT;

Optimistic vs Pessimistic Locking

-- Optimistic: Let it fail if conflict
UPDATE products 
SET quantity = quantity - 1, version = version + 1
WHERE id = 1 AND version = 5;

-- If version changed, another transaction modified it - retry or error

-- Pessimistic: Lock before modifying
BEGIN;
SELECT * FROM products WHERE id = 1 FOR UPDATE;
-- ... validate ...
UPDATE products SET quantity = quantity - 1 WHERE id = 1;
COMMIT;
Next: Window Functions →
← Back to Lessons

// Lesson 10: Window Functions

Window functions perform calculations across sets of rows related to the current row. Unlike aggregate functions, they don't collapse rows—you get results for each individual row.

Window Function Basics

-- Without window function (aggregate collapses rows)
SELECT department, AVG(salary) FROM employees GROUP BY department;

-- With window function (each row keeps its value)
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Ranking Functions

-- ROW_NUMBER: Unique sequential number
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM employees;

-- RANK: Same value = same rank, with gaps
SELECT 
    name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

-- DENSE_RANK: Same value = same rank, no gaps
SELECT 
    name,
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

Partitioning

-- Rank within department
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    RANK() OVER (ORDER BY salary DESC) AS overall_rank
FROM employees;

Lead and Lag

-- Previous row (LAG)
SELECT 
    created_at,
    price,
    LAG(price) OVER (ORDER BY created_at) AS prev_price,
    price - LAG(price) OVER (ORDER BY created_at) AS price_change
FROM stock_prices;

-- Next row (LEAD)
SELECT 
    name,
    salary,
    LEAD(salary) OVER (ORDER BY salary DESC) AS next_higher_salary
FROM employees;

-- Multiple rows back/forward
SELECT 
    created_at,
    LAG(created_at, 7) OVER (ORDER BY created_at) AS week_ago
FROM events;

Running Totals and Moving Averages

-- Running total
SELECT 
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- Moving average (last 7 days)
SELECT 
    date,
    price,
    AVG(price) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7
FROM stock_prices;

-- Frame specification
-- ROWS: Physical rows
-- RANGE: Logical rows with same value
-- GROUPS: Groups of same values

First and Last Values

-- First value in partition
SELECT 
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS lowest_in_dept,
    LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS highest_in_dept
FROM employees;

Percentiles

-- Percent rank within group
SELECT 
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) AS pct_rank
FROM employees;

-- Median (50th percentile)
SELECT 
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;
Next: Views & Functions →
← Back to Lessons

// Lesson 11: Views & Functions

Views and functions let you encapsulate complex logic, improve security, and create reusable database components.

Views

-- Simple view
CREATE VIEW active_users AS
SELECT * FROM users WHERE is_active = true;

-- Use view like a table
SELECT * FROM active_users;

-- Complex view with joins
CREATE VIEW order_summary AS
SELECT 
    o.id AS order_id,
    u.name AS customer_name,
    u.email,
    o.total,
    o.status,
    o.created_at,
    COUNT(oi.id) AS item_count
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, u.name, u.email, o.total, o.status, o.created_at;

-- Updatable view (simple conditions)
CREATE VIEW simple_users AS
SELECT id, name, email FROM users;

INSERT INTO simple_users (name, email) VALUES ('John', 'john@example.com');

Materialized Views

-- Materialized view (stores results physically)
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT 
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    SUM(total) AS revenue,
    COUNT(*) AS order_count
FROM orders
GROUP BY 
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
ORDER BY year, month;

-- Refresh materialized view
REFRESH MATERIALIZED VIEW monthly_sales;

-- Auto-refresh on schedule (PostgreSQL)
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT ...
WITH DATA;

Stored Functions

-- PostgreSQL function
CREATE OR REPLACE FUNCTION get_user_orders(user_id INT)
RETURNS TABLE (
    order_id INT,
    total DECIMAL(10,2),
    created_at TIMESTAMP
) AS $$
BEGIN
    RETURN QUERY
    SELECT o.id, o.total, o.created_at
    FROM orders o
    WHERE o.user_id = get_user_orders.user_id
    ORDER BY o.created_at DESC;
END;
$$ LANGUAGE plpgsql;

-- Use function
SELECT * FROM get_user_orders(1);

-- MySQL stored function
DELIMITER //
CREATE FUNCTION get_user_email(user_id INT)
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
    DECLARE email VARCHAR(255);
    SELECT e.email INTO email 
    FROM users u 
    WHERE u.id = user_id;
    RETURN email;
END //
DELIMITER ;

Stored Procedures

-- PostgreSQL procedure
CREATE OR REPLACE PROCEDURE transfer_funds(
    from_user INT,
    to_user INT,
    amount DECIMAL(10,2)
) AS $$
BEGIN
    UPDATE accounts 
    SET balance = balance - amount 
    WHERE user_id = from_user;
    
    UPDATE accounts 
    SET balance = balance + amount 
    WHERE user_id = to_user;
    
    INSERT INTO transactions (from_user, to_user, amount)
    VALUES (from_user, to_user, amount);
END;
$$ LANGUAGE plpgsql;

-- Call procedure
CALL transfer_funds(1, 2, 100.00);

Triggers

-- Trigger to update timestamp
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at();

-- Audit trail trigger
CREATE OR REPLACE FUNCTION audit_user_changes()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO user_audit (action, old_data, new_data, changed_at)
    VALUES (
        CASE 
            WHEN TG_OP = 'INSERT' THEN 'INSERT'
            WHEN TG_OP = 'UPDATE' THEN 'UPDATE'
            WHEN TG_OP = 'DELETE' THEN 'DELETE'
        END,
        CASE WHEN TG_OP != 'INSERT' THEN OLD ELSE NULL END,
        CASE WHEN TG_OP != 'DELETE' THEN NEW ELSE NULL END,
        CURRENT_TIMESTAMP
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Next: Production Patterns →
← Back to Lessons

// Lesson 12: Production Patterns

Running SQL in production requires careful attention to security, backup, performance, and scaling.

Database Design Patterns

Normal Forms

Common Patterns

-- Polymorphic association (for different entity types)
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    commentable_type VARCHAR(50),  -- 'post', 'product', etc.
    commentable_id INTEGER,
    body TEXT
);

-- JSONB for semi-structured data (PostgreSQL)
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    type VARCHAR(50),
    payload JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Index on JSON field
CREATE INDEX idx_events_type ON events((payload->>'type'));

Security

-- Use parameterized queries (prevents SQL injection)
-- In application code:
-- cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))

-- Principle of least privilege
-- Create read-only user
CREATE USER reader WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE myapp TO reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reader;

-- Row-level security (PostgreSQL)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_orders ON orders
    FOR SELECT
    USING (user_id = current_user_id());

Backup and Recovery

-- PostgreSQL backup
pg_dump -U postgres myapp > backup.sql

-- Restore
psql -U postgres myapp < backup.sql

-- Point-in-time recovery requires WAL archiving
-- In postgresql.conf:
-- wal_level = replica
-- archive_mode = on
-- archive_command = 'cp %p /backup/wal/%f'

Connection Pooling

-- PgBouncer configuration
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Replication

Streaming Replication

-- On primary (postgresql.conf)
wal_level = replica
max_wal_senders = 3

-- On replica (recovery.conf)
primary_conninfo = 'host=primary port=5432 user=repl'
restore_command = 'cp /wal/%f %p'

Query Optimization Best Practices

The Road Ahead

Congratulations on completing this guide! You've learned:

Continue your journey with:

← Back to Lessons