Add note files about query execution models and indexes

2026-04-01 09:09:58 +02:00 · 2026-04-01 09:09:58 +02:00 · 8ed8347380
commit 8ed8347380
parent 2a33f8b483
2 changed files with 320 additions and 0 deletions
--- a/hqew/006-query-execution-models.md
+++ b/hqew/006-query-execution-models.md
@ -0,0 +1,167 @@
+# Query Execution Models
+
+A reference for the main ways query operators run at runtime.
+
+---
+
+## Short answer
+
+An execution model defines how operators consume input, produce output, and pass data through a plan.
+
+The most important questions are:
+
+- one row at a time or many values at once?
+- pull-based or push-based?
+- pipelined or materialized?
+
+Those choices strongly affect latency, CPU efficiency, and implementation complexity.
+
+---
+
+## Row-at-a-time execution
+
+In a row-oriented model, operators process one tuple at a time.
+
+This is often implemented with an iterator interface where a parent asks a child for the next row.
+
+Strengths:
+
+- simple
+- modular
+- easy to debug
+
+Weaknesses:
+
+- high per-row overhead
+- worse cache behavior for analytics
+
+This model is historically important and still useful in many systems.
+
+---
+
+## Batch-oriented execution
+
+In a batch model, operators process chunks of rows together.
+
+The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.
+
+Strengths:
+
+- better CPU efficiency
+- lower dispatch overhead
+- easier parallelism inside an operator
+
+Weaknesses:
+
+- more bookkeeping
+- more complex control flow
+
+---
+
+## Vectorized execution
+
+Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.
+
+This fits well with columnar memory layouts and analytical workloads.
+
+Strengths:
+
+- excellent cache locality
+- better SIMD opportunities
+- good fit for scans, filters, joins, and aggregates
+
+Weaknesses:
+
+- some control-flow-heavy logic is less natural
+- more careful null and type handling is needed
+
+---
+
+## Pull vs push
+
+### Pull-based execution
+
+Parent operators ask children for data.
+
+Strengths:
+
+- natural operator trees
+- straightforward control flow
+
+Weaknesses:
+
+- can introduce repeated dispatch overhead
+
+### Push-based execution
+
+Child operators push data to parents or downstream consumers.
+
+Strengths:
+
+- natural for streaming or event-driven systems
+- can work well with pipeline fusion
+
+Weaknesses:
+
+- control flow can be harder to reason about
+
+Many systems combine these ideas rather than choosing only one.
+
+---
+
+## Pipelining vs materialization
+
+### Pipelined execution
+
+Operators pass intermediate results incrementally.
+
+Strengths:
+
+- low latency
+- less temporary storage in favorable cases
+
+Weaknesses:
+
+- some operators still create barriers
+
+### Materializing execution
+
+An operator stores its entire output before the next operator consumes it.
+
+Strengths:
+
+- simpler boundaries
+- easier reuse of intermediates
+
+Weaknesses:
+
+- more memory and I/O cost
+- higher latency
+
+---
+
+## Blocking operators
+
+Some operators are naturally blocking.
+
+Examples:
+
+- sort
+- some aggregates
+- some join strategies
+
+These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.
+
+---
+
+## Practical mental model
+
+Execution models are about runtime granularity and data flow.
+
+If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"
+
+---
+
+## Changelog
+
+* **April 1, 2026** -- First version created.
--- a/hqew/007-storage-and-indexes.md
+++ b/hqew/007-storage-and-indexes.md
@ -0,0 +1,153 @@
+# Storage and Indexes
+
+A reference for how storage layout and indexing shape query execution.
+
+---
+
+## Short answer
+
+Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
+
+Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
+
+---
+
+## Row store vs column store
+
+### Row store
+
+Stores all fields of one row together.
+
+Good for:
+
+- point lookups
+- updates of whole records
+- transactional workloads
+
+Weak for:
+
+- scanning a few columns across many rows
+
+### Column store
+
+Stores values of the same column together.
+
+Good for:
+
+- analytical scans
+- compression
+- vectorized execution
+- reading only selected columns
+
+Weak for:
+
+- reconstructing many full records repeatedly
+
+---
+
+## Why storage layout matters
+
+The storage layout affects:
+
+- I/O volume
+- cache locality
+- compression opportunities
+- pushdown behavior
+- operator implementation strategy
+
+So storage is a first-order architecture decision, not just a persistence detail.
+
+---
+
+## Common index types
+
+### B-tree
+
+A classic ordered index, good for:
+
+- point lookups
+- range queries
+- ordered scans
+
+### Hash index
+
+Optimized for exact-match lookups.
+
+Good for:
+
+- equality predicates
+
+Weak for:
+
+- range queries
+
+### LSM-based indexing
+
+Common in modern write-heavy systems.
+
+Good for:
+
+- high write throughput
+- append-heavy workloads
+
+Tradeoff:
+
+- reads often need compaction-aware logic
+
+### Inverted index
+
+Maps terms to documents or postings.
+
+Good for:
+
+- text search
+- filtering over tokenized fields
+
+### Vector index
+
+Supports approximate nearest-neighbor search over embeddings.
+
+Good for:
+
+- semantic search
+- similarity retrieval
+
+Tradeoff:
+
+- often approximate rather than exact
+
+---
+
+## What indexes buy
+
+Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
+
+They are most valuable when:
+
+- the predicate is selective
+- the access pattern repeats often
+- the engine can exploit the index directly
+
+They are less valuable when:
+
+- most rows are needed anyway
+- the predicate is too broad
+- maintaining the index is too expensive for the workload
+
+---
+
+## Practical mental model
+
+Tables define what data exists.
+
+Storage layout defines how that data is physically organized.
+
+Indexes define shortcuts through that organization.
+
+That is the simplest useful framing.
+
+---
+
+## Changelog
+
+* **April 1, 2026** -- First version created.