useful-notes/hqew/007-storage-and-indexes.md

2.4 KiB

Storage and Indexes

A reference for how storage layout and indexing shape query execution.


Short answer

Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.

Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.


Row store vs column store

Row store

Stores all fields of one row together.

Good for:

  • point lookups
  • updates of whole records
  • transactional workloads

Weak for:

  • scanning a few columns across many rows

Column store

Stores values of the same column together.

Good for:

  • analytical scans
  • compression
  • vectorized execution
  • reading only selected columns

Weak for:

  • reconstructing many full records repeatedly

Why storage layout matters

The storage layout affects:

  • I/O volume
  • cache locality
  • compression opportunities
  • pushdown behavior
  • operator implementation strategy

So storage is a first-order architecture decision, not just a persistence detail.


Common index types

B-tree

A classic ordered index, good for:

  • point lookups
  • range queries
  • ordered scans

Hash index

Optimized for exact-match lookups.

Good for:

  • equality predicates

Weak for:

  • range queries

LSM-based indexing

Common in modern write-heavy systems.

Good for:

  • high write throughput
  • append-heavy workloads

Tradeoff:

  • reads often need compaction-aware logic

Inverted index

Maps terms to documents or postings.

Good for:

  • text search
  • filtering over tokenized fields

Vector index

Supports approximate nearest-neighbor search over embeddings.

Good for:

  • semantic search
  • similarity retrieval

Tradeoff:

  • often approximate rather than exact

What indexes buy

Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.

They are most valuable when:

  • the predicate is selective
  • the access pattern repeats often
  • the engine can exploit the index directly

They are less valuable when:

  • most rows are needed anyway
  • the predicate is too broad
  • maintaining the index is too expensive for the workload

Practical mental model

Tables define what data exists.

Storage layout defines how that data is physically organized.

Indexes define shortcuts through that organization.

That is the simplest useful framing.


Changelog

  • April 1, 2026 -- First version created.