2.4 KiB
Storage and Indexes
A reference for how storage layout and indexing shape query execution.
Short answer
Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
Row store vs column store
Row store
Stores all fields of one row together.
Good for:
- point lookups
- updates of whole records
- transactional workloads
Weak for:
- scanning a few columns across many rows
Column store
Stores values of the same column together.
Good for:
- analytical scans
- compression
- vectorized execution
- reading only selected columns
Weak for:
- reconstructing many full records repeatedly
Why storage layout matters
The storage layout affects:
- I/O volume
- cache locality
- compression opportunities
- pushdown behavior
- operator implementation strategy
So storage is a first-order architecture decision, not just a persistence detail.
Common index types
B-tree
A classic ordered index, good for:
- point lookups
- range queries
- ordered scans
Hash index
Optimized for exact-match lookups.
Good for:
- equality predicates
Weak for:
- range queries
LSM-based indexing
Common in modern write-heavy systems.
Good for:
- high write throughput
- append-heavy workloads
Tradeoff:
- reads often need compaction-aware logic
Inverted index
Maps terms to documents or postings.
Good for:
- text search
- filtering over tokenized fields
Vector index
Supports approximate nearest-neighbor search over embeddings.
Good for:
- semantic search
- similarity retrieval
Tradeoff:
- often approximate rather than exact
What indexes buy
Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
They are most valuable when:
- the predicate is selective
- the access pattern repeats often
- the engine can exploit the index directly
They are less valuable when:
- most rows are needed anyway
- the predicate is too broad
- maintaining the index is too expensive for the workload
Practical mental model
Tables define what data exists.
Storage layout defines how that data is physically organized.
Indexes define shortcuts through that organization.
That is the simplest useful framing.
Changelog
- April 1, 2026 -- First version created.