# Storage and Indexes A reference for how storage layout and indexing shape query execution. --- ## Short answer Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit. Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns. --- ## Row store vs column store ### Row store Stores all fields of one row together. Good for: - point lookups - updates of whole records - transactional workloads Weak for: - scanning a few columns across many rows ### Column store Stores values of the same column together. Good for: - analytical scans - compression - vectorized execution - reading only selected columns Weak for: - reconstructing many full records repeatedly --- ## Why storage layout matters The storage layout affects: - I/O volume - cache locality - compression opportunities - pushdown behavior - operator implementation strategy So storage is a first-order architecture decision, not just a persistence detail. --- ## Common index types ### B-tree A classic ordered index, good for: - point lookups - range queries - ordered scans ### Hash index Optimized for exact-match lookups. Good for: - equality predicates Weak for: - range queries ### LSM-based indexing Common in modern write-heavy systems. Good for: - high write throughput - append-heavy workloads Tradeoff: - reads often need compaction-aware logic ### Inverted index Maps terms to documents or postings. Good for: - text search - filtering over tokenized fields ### Vector index Supports approximate nearest-neighbor search over embeddings. Good for: - semantic search - similarity retrieval Tradeoff: - often approximate rather than exact --- ## What indexes buy Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run. They are most valuable when: - the predicate is selective - the access pattern repeats often - the engine can exploit the index directly They are less valuable when: - most rows are needed anyway - the predicate is too broad - maintaining the index is too expensive for the workload --- ## Practical mental model Tables define what data exists. Storage layout defines how that data is physically organized. Indexes define shortcuts through that organization. That is the simplest useful framing. --- ## Changelog * **April 1, 2026** -- First version created.