Add note files about query execution models and indexes
This commit is contained in:
parent
2a33f8b483
commit
8ed8347380
167
hqew/006-query-execution-models.md
Normal file
167
hqew/006-query-execution-models.md
Normal file
@ -0,0 +1,167 @@
|
|||||||
|
# Query Execution Models
|
||||||
|
|
||||||
|
A reference for the main ways query operators run at runtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Short answer
|
||||||
|
|
||||||
|
An execution model defines how operators consume input, produce output, and pass data through a plan.
|
||||||
|
|
||||||
|
The most important questions are:
|
||||||
|
|
||||||
|
- one row at a time or many values at once?
|
||||||
|
- pull-based or push-based?
|
||||||
|
- pipelined or materialized?
|
||||||
|
|
||||||
|
Those choices strongly affect latency, CPU efficiency, and implementation complexity.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Row-at-a-time execution
|
||||||
|
|
||||||
|
In a row-oriented model, operators process one tuple at a time.
|
||||||
|
|
||||||
|
This is often implemented with an iterator interface where a parent asks a child for the next row.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- simple
|
||||||
|
- modular
|
||||||
|
- easy to debug
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- high per-row overhead
|
||||||
|
- worse cache behavior for analytics
|
||||||
|
|
||||||
|
This model is historically important and still useful in many systems.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Batch-oriented execution
|
||||||
|
|
||||||
|
In a batch model, operators process chunks of rows together.
|
||||||
|
|
||||||
|
The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- better CPU efficiency
|
||||||
|
- lower dispatch overhead
|
||||||
|
- easier parallelism inside an operator
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- more bookkeeping
|
||||||
|
- more complex control flow
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Vectorized execution
|
||||||
|
|
||||||
|
Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.
|
||||||
|
|
||||||
|
This fits well with columnar memory layouts and analytical workloads.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- excellent cache locality
|
||||||
|
- better SIMD opportunities
|
||||||
|
- good fit for scans, filters, joins, and aggregates
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- some control-flow-heavy logic is less natural
|
||||||
|
- more careful null and type handling is needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pull vs push
|
||||||
|
|
||||||
|
### Pull-based execution
|
||||||
|
|
||||||
|
Parent operators ask children for data.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- natural operator trees
|
||||||
|
- straightforward control flow
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- can introduce repeated dispatch overhead
|
||||||
|
|
||||||
|
### Push-based execution
|
||||||
|
|
||||||
|
Child operators push data to parents or downstream consumers.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- natural for streaming or event-driven systems
|
||||||
|
- can work well with pipeline fusion
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- control flow can be harder to reason about
|
||||||
|
|
||||||
|
Many systems combine these ideas rather than choosing only one.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pipelining vs materialization
|
||||||
|
|
||||||
|
### Pipelined execution
|
||||||
|
|
||||||
|
Operators pass intermediate results incrementally.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- low latency
|
||||||
|
- less temporary storage in favorable cases
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- some operators still create barriers
|
||||||
|
|
||||||
|
### Materializing execution
|
||||||
|
|
||||||
|
An operator stores its entire output before the next operator consumes it.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
- simpler boundaries
|
||||||
|
- easier reuse of intermediates
|
||||||
|
|
||||||
|
Weaknesses:
|
||||||
|
|
||||||
|
- more memory and I/O cost
|
||||||
|
- higher latency
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Blocking operators
|
||||||
|
|
||||||
|
Some operators are naturally blocking.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
- sort
|
||||||
|
- some aggregates
|
||||||
|
- some join strategies
|
||||||
|
|
||||||
|
These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical mental model
|
||||||
|
|
||||||
|
Execution models are about runtime granularity and data flow.
|
||||||
|
|
||||||
|
If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **April 1, 2026** -- First version created.
|
||||||
153
hqew/007-storage-and-indexes.md
Normal file
153
hqew/007-storage-and-indexes.md
Normal file
@ -0,0 +1,153 @@
|
|||||||
|
# Storage and Indexes
|
||||||
|
|
||||||
|
A reference for how storage layout and indexing shape query execution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Short answer
|
||||||
|
|
||||||
|
Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
|
||||||
|
|
||||||
|
Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Row store vs column store
|
||||||
|
|
||||||
|
### Row store
|
||||||
|
|
||||||
|
Stores all fields of one row together.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- point lookups
|
||||||
|
- updates of whole records
|
||||||
|
- transactional workloads
|
||||||
|
|
||||||
|
Weak for:
|
||||||
|
|
||||||
|
- scanning a few columns across many rows
|
||||||
|
|
||||||
|
### Column store
|
||||||
|
|
||||||
|
Stores values of the same column together.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- analytical scans
|
||||||
|
- compression
|
||||||
|
- vectorized execution
|
||||||
|
- reading only selected columns
|
||||||
|
|
||||||
|
Weak for:
|
||||||
|
|
||||||
|
- reconstructing many full records repeatedly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why storage layout matters
|
||||||
|
|
||||||
|
The storage layout affects:
|
||||||
|
|
||||||
|
- I/O volume
|
||||||
|
- cache locality
|
||||||
|
- compression opportunities
|
||||||
|
- pushdown behavior
|
||||||
|
- operator implementation strategy
|
||||||
|
|
||||||
|
So storage is a first-order architecture decision, not just a persistence detail.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common index types
|
||||||
|
|
||||||
|
### B-tree
|
||||||
|
|
||||||
|
A classic ordered index, good for:
|
||||||
|
|
||||||
|
- point lookups
|
||||||
|
- range queries
|
||||||
|
- ordered scans
|
||||||
|
|
||||||
|
### Hash index
|
||||||
|
|
||||||
|
Optimized for exact-match lookups.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- equality predicates
|
||||||
|
|
||||||
|
Weak for:
|
||||||
|
|
||||||
|
- range queries
|
||||||
|
|
||||||
|
### LSM-based indexing
|
||||||
|
|
||||||
|
Common in modern write-heavy systems.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- high write throughput
|
||||||
|
- append-heavy workloads
|
||||||
|
|
||||||
|
Tradeoff:
|
||||||
|
|
||||||
|
- reads often need compaction-aware logic
|
||||||
|
|
||||||
|
### Inverted index
|
||||||
|
|
||||||
|
Maps terms to documents or postings.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- text search
|
||||||
|
- filtering over tokenized fields
|
||||||
|
|
||||||
|
### Vector index
|
||||||
|
|
||||||
|
Supports approximate nearest-neighbor search over embeddings.
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
|
||||||
|
- semantic search
|
||||||
|
- similarity retrieval
|
||||||
|
|
||||||
|
Tradeoff:
|
||||||
|
|
||||||
|
- often approximate rather than exact
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What indexes buy
|
||||||
|
|
||||||
|
Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
|
||||||
|
|
||||||
|
They are most valuable when:
|
||||||
|
|
||||||
|
- the predicate is selective
|
||||||
|
- the access pattern repeats often
|
||||||
|
- the engine can exploit the index directly
|
||||||
|
|
||||||
|
They are less valuable when:
|
||||||
|
|
||||||
|
- most rows are needed anyway
|
||||||
|
- the predicate is too broad
|
||||||
|
- maintaining the index is too expensive for the workload
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical mental model
|
||||||
|
|
||||||
|
Tables define what data exists.
|
||||||
|
|
||||||
|
Storage layout defines how that data is physically organized.
|
||||||
|
|
||||||
|
Indexes define shortcuts through that organization.
|
||||||
|
|
||||||
|
That is the simplest useful framing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **April 1, 2026** -- First version created.
|
||||||
Loading…
x
Reference in New Issue
Block a user