Data Stores Under the Hood: MySQL
Most of us use MySQL daily without thinking too hard about what's happening under the hood. We've got a table, we write some SQL, we get results. But when you're building systems that need to scale—or when something goes wrong at 2 AM—understanding the internals makes a huge difference.
This is a look at MySQL's architecture, focusing on what matters for application developers.
The Engine
MySQL has pluggable storage engines. InnoDB is the default (and the one you should almost always use). It replaced MyISAM around MySQL 5.5 and brings ACID compliance, row-level locking, and crash recovery.
Other engines exist—Memory, MyISAM, Archive—but they're niche use cases. Stick with InnoDB unless you have a specific reason not to.
Storage Structure
InnoDB stores data in B+ trees. If you squint, it looks like this:
[50]
/ \
[20,30] [70,90]
/ | \ / | \
10 25 40 60 80 95
↓ ↓ ↓ ↓ ↓ ↓
[data] ... sequential linked list of leaf nodes ...
Key characteristics:
- Non-leaf nodes hold only keys (indexes), acting as a hierarchical navigation structure
- Leaf nodes store the actual data, plus a pointer to the next leaf (forming a linked list)
- This linked list makes range queries efficient—you can scan a sequence of pages without bouncing around the tree
- Primary key lookups are fast: log₂(n) depth, typically 3-4 levels for millions of rows
Row vs Column Orientation
MySQL is row-oriented. Each row contains all its columns together on disk. This makes reads of complete rows fast (good for OLTP) but analytical queries scanning millions of rows less efficient.
For analytical workloads, consider columnar stores like ClickHouse or MySQL's own HeatWave.
Indices
Indices are how MySQL finds data fast. Without them, every query would be a full table scan.
Clustered Index
The primary key becomes the clustered index. Data is physically stored in the order of this index. This means:
- Looking up by primary key is a single B+ tree traversal
- Insertions are not in insertion order if you use a random UUID primary key (causes page splits)
- Every InnoDB table has exactly one clustered index
CREATE TABLE users (
id BIGINT PRIMARY KEY, -- This becomes the clustered index
email VARCHAR(255),
created_at TIMESTAMP
);
Secondary Indices
Any additional indexes you create are secondary indices. They point back to the primary key, not the row directly.
CREATE INDEX idx_email ON users(email);
This means looking up by email requires two B+ tree traversals: one to find the primary key, one to fetch the row. For high-frequency queries, this matters.
Covering Indices
A covering index includes all columns needed by a query, so MySQL never touches the actual table.
-- If this query runs frequently:
SELECT email, created_at FROM users WHERE email = 'user@example.com';
-- This index would be a covering index:
CREATE INDEX idx_email_covering ON users(email, created_at);
ACID Transactions
InnoDB supports full ACID transactions:
| Property | What it means | InnoDB behavior |
|---|---|---|
| Atomicity | Transactions are all-or-nothing | Undo logs roll back partial changes |
| Consistency | DB moves from valid state to valid state | Foreign keys, constraints enforced |
| Isolation | Concurrent transactions don't interfere | MVCC with configurable isolation levels |
| Durability | Committed data survives crashes | Write-ahead logging (redo logs) |
Isolation Levels
MySQL's default is REPEATABLE READ, but you can configure per-session:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
| Level | Dirty reads | Non-repeatable reads | Phantom reads |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read (default) | Prevented | Prevented | Possible* |
| Serializable | Prevented | Prevented | Prevented |
*InnoDB's gap locking makes phantom reads rare even at REPEATABLE READ.
MVCC (Multi-Version Concurrency Control)
MVCC is what makes InnoDB's concurrent reads possible. Instead of locking rows for readers, it keeps older versions in an undo log.
When you start a transaction, you see a snapshot of the data at that point—even if other transactions are modifying rows concurrently. Writers don't block readers.
The magic happens in two hidden columns:
DB_TRX_ID— transaction ID that last modified the rowDB_ROLL_PTR— pointer to the undo log entry
Old versions get cleaned up by the purge thread when they're no longer needed for active transactions.
Replication
MySQL replicates using a leader-follower model. The leader writes to a binary log; followers read from it and apply the changes.
Statement-Based vs Row-Based
Two replication formats exist:
| Format | Pros | Cons |
|---|---|---|
| SBR (statement) | Smaller log size, deterministic functions work | Non-deterministic queries can desync |
| RBR (row) | Always consistent | Larger logs, harder to audit |
Most production setups use RBR or MIXED (MySQL picks per statement).
Replication Methods
Position-based (traditional):
Leader: writes to mysql-bin.000001
Follower: "start from position 1234 in that file"
GTID-based (modern, recommended):
Leader: assigns a globally unique ID (GTID) to each transaction
Follower: "replay transaction GTID-1234"
GTIDs make replication easier to manage—no need to track file positions, failover is cleaner.
Semi-Synchronous Replication
Default MySQL replication is asynchronous: the leader commits and doesn't wait for followers.
Semi-synchronous adds a wait:
-- Leader waits until at least one follower acknowledges
SET GLOBAL rpl_semi_sync_master_wait_for_slave_count = 1;
This trades some performance for stronger durability guarantees.
Multi-Leader
MySQL doesn't natively support multi-leader well. Group Replication is the built-in option (MySQL 8.0+), or you use external tools like Tungsten Replicator.
For most cases: stick with single leader. Multi-leader conflict resolution is harder than it looks.
Failure Recovery
When MySQL crashes, recovery happens in stages:
- Redo log replay — Committed transactions get reapplied
- Undo log rollback — Incomplete transactions get rolled back
- Purge — Old row versions get cleaned up
This is why you set innodb_log_file_size appropriately and why SSDs help—redo log I/O is sequential and critical to recovery speed.
Schema Design Considerations
Auto-increment vs UUID Primary Keys
-- Auto-increment (recommended for most cases)
id BIGINT AUTO_INCREMENT PRIMARY KEY
-- UUID (watch out for random inserts)
id CHAR(36) PRIMARY KEY -- Causes B+ tree page splits
Foreign Keys
InnoDB enforces foreign keys. Love them or hate them, they ensure referential integrity. Some teams disable them for performance (PostgreSQL has similar debates).
Character Sets
-- Always use utf8mb4, not utf8
-- utf8 in MySQL is a 3-byte subset; utf8mb4 is true 4-byte UTF-8
CREATE TABLE t (...) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
What's Not There
A few things MySQL doesn't have that you might want:
| Feature | Workaround |
|---|---|
| No materialized views | Build at application layer or use triggers |
| No JSON-first design | JSON column type exists, but it's not MongoDB |
| Limited GIS features | PostGIS is better for serious geospatial |
| No native partitioning intelligence | Use sharding carefully |
Practical Takeaways
-
Index strategically — Every index has a write cost. Index columns you query, not columns you insert.
-
Use auto-increment primary keys unless you have a compelling reason not to.
-
Understand your isolation level — READ COMMITTED can avoid more deadlocks than REPEATABLE READ.
-
Use EXPLAIN — Before shipping a query to production:
EXPLAIN FORMAT=JSON SELECT ... FROM users WHERE email = 'test@example.com'; -
Watch your buffer pool — InnoDB caches everything in the buffer pool. If it doesn't fit, you hit disk constantly.
MySQL is mature, battle-tested, and has a massive ecosystem. Understanding its internals helps you use it well—and debug it when it misbehaves.
