useful-notes/hqew/006-query-execution-models.md

168 lines
3.1 KiB
Markdown

# Query Execution Models
A reference for the main ways query operators run at runtime.
---
## Short answer
An execution model defines how operators consume input, produce output, and pass data through a plan.
The most important questions are:
- one row at a time or many values at once?
- pull-based or push-based?
- pipelined or materialized?
Those choices strongly affect latency, CPU efficiency, and implementation complexity.
---
## Row-at-a-time execution
In a row-oriented model, operators process one tuple at a time.
This is often implemented with an iterator interface where a parent asks a child for the next row.
Strengths:
- simple
- modular
- easy to debug
Weaknesses:
- high per-row overhead
- worse cache behavior for analytics
This model is historically important and still useful in many systems.
---
## Batch-oriented execution
In a batch model, operators process chunks of rows together.
The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.
Strengths:
- better CPU efficiency
- lower dispatch overhead
- easier parallelism inside an operator
Weaknesses:
- more bookkeeping
- more complex control flow
---
## Vectorized execution
Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.
This fits well with columnar memory layouts and analytical workloads.
Strengths:
- excellent cache locality
- better SIMD opportunities
- good fit for scans, filters, joins, and aggregates
Weaknesses:
- some control-flow-heavy logic is less natural
- more careful null and type handling is needed
---
## Pull vs push
### Pull-based execution
Parent operators ask children for data.
Strengths:
- natural operator trees
- straightforward control flow
Weaknesses:
- can introduce repeated dispatch overhead
### Push-based execution
Child operators push data to parents or downstream consumers.
Strengths:
- natural for streaming or event-driven systems
- can work well with pipeline fusion
Weaknesses:
- control flow can be harder to reason about
Many systems combine these ideas rather than choosing only one.
---
## Pipelining vs materialization
### Pipelined execution
Operators pass intermediate results incrementally.
Strengths:
- low latency
- less temporary storage in favorable cases
Weaknesses:
- some operators still create barriers
### Materializing execution
An operator stores its entire output before the next operator consumes it.
Strengths:
- simpler boundaries
- easier reuse of intermediates
Weaknesses:
- more memory and I/O cost
- higher latency
---
## Blocking operators
Some operators are naturally blocking.
Examples:
- sort
- some aggregates
- some join strategies
These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.
---
## Practical mental model
Execution models are about runtime granularity and data flow.
If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"
---
## Changelog
* **April 1, 2026** -- First version created.