Add note files about query execution models and indexes

2026-04-01 09:09:58 +02:00 · 2026-04-01 09:09:58 +02:00 · 8ed8347380
commit 8ed8347380
parent 2a33f8b483
2 changed files with 320 additions and 0 deletions
--- a/hqew/006-query-execution-models.md
+++ b/hqew/006-query-execution-models.md
@ -0,0 +1,167 @@
 # Query Execution Models
 A reference for the main ways query operators run at runtime.
 ---
 ## Short answer
 An execution model defines how operators consume input, produce output, and pass data through a plan.
 The most important questions are:
 - one row at a time or many values at once?
 - pull-based or push-based?
 - pipelined or materialized?
 Those choices strongly affect latency, CPU efficiency, and implementation complexity.
 ---
 ## Row-at-a-time execution
 In a row-oriented model, operators process one tuple at a time.
 This is often implemented with an iterator interface where a parent asks a child for the next row.
 Strengths:
 - simple
 - modular
 - easy to debug
 Weaknesses:
 - high per-row overhead
 - worse cache behavior for analytics
 This model is historically important and still useful in many systems.
 ---
 ## Batch-oriented execution
 In a batch model, operators process chunks of rows together.
 The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.
 Strengths:
 - better CPU efficiency
 - lower dispatch overhead
 - easier parallelism inside an operator
 Weaknesses:
 - more bookkeeping
 - more complex control flow
 ---
 ## Vectorized execution
 Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.
 This fits well with columnar memory layouts and analytical workloads.
 Strengths:
 - excellent cache locality
 - better SIMD opportunities
 - good fit for scans, filters, joins, and aggregates
 Weaknesses:
 - some control-flow-heavy logic is less natural
 - more careful null and type handling is needed
 ---
 ## Pull vs push
 ### Pull-based execution
 Parent operators ask children for data.
 Strengths:
 - natural operator trees
 - straightforward control flow
 Weaknesses:
 - can introduce repeated dispatch overhead
 ### Push-based execution
 Child operators push data to parents or downstream consumers.
 Strengths:
 - natural for streaming or event-driven systems
 - can work well with pipeline fusion
 Weaknesses:
 - control flow can be harder to reason about
 Many systems combine these ideas rather than choosing only one.
 ---
 ## Pipelining vs materialization
 ### Pipelined execution
 Operators pass intermediate results incrementally.
 Strengths:
 - low latency
 - less temporary storage in favorable cases
 Weaknesses:
 - some operators still create barriers
 ### Materializing execution
 An operator stores its entire output before the next operator consumes it.
 Strengths:
 - simpler boundaries
 - easier reuse of intermediates
 Weaknesses:
 - more memory and I/O cost
 - higher latency
 ---
 ## Blocking operators
 Some operators are naturally blocking.
 Examples:
 - sort
 - some aggregates
 - some join strategies
 These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.
 ---
 ## Practical mental model
 Execution models are about runtime granularity and data flow.
 If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"
 ---
 ## Changelog
 * **April 1, 2026** -- First version created.
--- a/hqew/007-storage-and-indexes.md
+++ b/hqew/007-storage-and-indexes.md
@ -0,0 +1,153 @@
 # Storage and Indexes
 A reference for how storage layout and indexing shape query execution.
 ---
 ## Short answer
 Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
 Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
 ---
 ## Row store vs column store
 ### Row store
 Stores all fields of one row together.
 Good for:
 - point lookups
 - updates of whole records
 - transactional workloads
 Weak for:
 - scanning a few columns across many rows
 ### Column store
 Stores values of the same column together.
 Good for:
 - analytical scans
 - compression
 - vectorized execution
 - reading only selected columns
 Weak for:
 - reconstructing many full records repeatedly
 ---
 ## Why storage layout matters
 The storage layout affects:
 - I/O volume
 - cache locality
 - compression opportunities
 - pushdown behavior
 - operator implementation strategy
 So storage is a first-order architecture decision, not just a persistence detail.
 ---
 ## Common index types
 ### B-tree
 A classic ordered index, good for:
 - point lookups
 - range queries
 - ordered scans
 ### Hash index
 Optimized for exact-match lookups.
 Good for:
 - equality predicates
 Weak for:
 - range queries
 ### LSM-based indexing
 Common in modern write-heavy systems.
 Good for:
 - high write throughput
 - append-heavy workloads
 Tradeoff:
 - reads often need compaction-aware logic
 ### Inverted index
 Maps terms to documents or postings.
 Good for:
 - text search
 - filtering over tokenized fields
 ### Vector index
 Supports approximate nearest-neighbor search over embeddings.
 Good for:
 - semantic search
 - similarity retrieval
 Tradeoff:
 - often approximate rather than exact
 ---
 ## What indexes buy
 Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
 They are most valuable when:
 - the predicate is selective
 - the access pattern repeats often
 - the engine can exploit the index directly
 They are less valuable when:
 - most rows are needed anyway
 - the predicate is too broad
 - maintaining the index is too expensive for the workload
 ---
 ## Practical mental model
 Tables define what data exists.
 Storage layout defines how that data is physically organized.
 Indexes define shortcuts through that organization.
 That is the simplest useful framing.
 ---
 ## Changelog
 * **April 1, 2026** -- First version created.