154 lines
2.4 KiB
Markdown
154 lines
2.4 KiB
Markdown
# Storage and Indexes
|
|
|
|
A reference for how storage layout and indexing shape query execution.
|
|
|
|
---
|
|
|
|
## Short answer
|
|
|
|
Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
|
|
|
|
Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
|
|
|
|
---
|
|
|
|
## Row store vs column store
|
|
|
|
### Row store
|
|
|
|
Stores all fields of one row together.
|
|
|
|
Good for:
|
|
|
|
- point lookups
|
|
- updates of whole records
|
|
- transactional workloads
|
|
|
|
Weak for:
|
|
|
|
- scanning a few columns across many rows
|
|
|
|
### Column store
|
|
|
|
Stores values of the same column together.
|
|
|
|
Good for:
|
|
|
|
- analytical scans
|
|
- compression
|
|
- vectorized execution
|
|
- reading only selected columns
|
|
|
|
Weak for:
|
|
|
|
- reconstructing many full records repeatedly
|
|
|
|
---
|
|
|
|
## Why storage layout matters
|
|
|
|
The storage layout affects:
|
|
|
|
- I/O volume
|
|
- cache locality
|
|
- compression opportunities
|
|
- pushdown behavior
|
|
- operator implementation strategy
|
|
|
|
So storage is a first-order architecture decision, not just a persistence detail.
|
|
|
|
---
|
|
|
|
## Common index types
|
|
|
|
### B-tree
|
|
|
|
A classic ordered index, good for:
|
|
|
|
- point lookups
|
|
- range queries
|
|
- ordered scans
|
|
|
|
### Hash index
|
|
|
|
Optimized for exact-match lookups.
|
|
|
|
Good for:
|
|
|
|
- equality predicates
|
|
|
|
Weak for:
|
|
|
|
- range queries
|
|
|
|
### LSM-based indexing
|
|
|
|
Common in modern write-heavy systems.
|
|
|
|
Good for:
|
|
|
|
- high write throughput
|
|
- append-heavy workloads
|
|
|
|
Tradeoff:
|
|
|
|
- reads often need compaction-aware logic
|
|
|
|
### Inverted index
|
|
|
|
Maps terms to documents or postings.
|
|
|
|
Good for:
|
|
|
|
- text search
|
|
- filtering over tokenized fields
|
|
|
|
### Vector index
|
|
|
|
Supports approximate nearest-neighbor search over embeddings.
|
|
|
|
Good for:
|
|
|
|
- semantic search
|
|
- similarity retrieval
|
|
|
|
Tradeoff:
|
|
|
|
- often approximate rather than exact
|
|
|
|
---
|
|
|
|
## What indexes buy
|
|
|
|
Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
|
|
|
|
They are most valuable when:
|
|
|
|
- the predicate is selective
|
|
- the access pattern repeats often
|
|
- the engine can exploit the index directly
|
|
|
|
They are less valuable when:
|
|
|
|
- most rows are needed anyway
|
|
- the predicate is too broad
|
|
- maintaining the index is too expensive for the workload
|
|
|
|
---
|
|
|
|
## Practical mental model
|
|
|
|
Tables define what data exists.
|
|
|
|
Storage layout defines how that data is physically organized.
|
|
|
|
Indexes define shortcuts through that organization.
|
|
|
|
That is the simplest useful framing.
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
* **April 1, 2026** -- First version created.
|