useful-notes/hqew/006-query-execution-models.md

# Query Execution Models

A reference for the main ways query operators run at runtime.

---

## Short answer

An execution model defines how operators consume input, produce output, and pass data through a plan.

The most important questions are:

- one row at a time or many values at once?
- pull-based or push-based?
- pipelined or materialized?

Those choices strongly affect latency, CPU efficiency, and implementation complexity.

---

## Row-at-a-time execution

In a row-oriented model, operators process one tuple at a time.

This is often implemented with an iterator interface where a parent asks a child for the next row.

Strengths:

- simple
- modular
- easy to debug

Weaknesses:

- high per-row overhead
- worse cache behavior for analytics

This model is historically important and still useful in many systems.

---

## Batch-oriented execution

In a batch model, operators process chunks of rows together.

The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.

Strengths:

- better CPU efficiency
- lower dispatch overhead
- easier parallelism inside an operator

Weaknesses:

- more bookkeeping
- more complex control flow

---

## Vectorized execution

Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.

This fits well with columnar memory layouts and analytical workloads.

Strengths:

- excellent cache locality
- better SIMD opportunities
- good fit for scans, filters, joins, and aggregates

Weaknesses:

- some control-flow-heavy logic is less natural
- more careful null and type handling is needed

---

## Pull vs push

### Pull-based execution

Parent operators ask children for data.

Strengths:

- natural operator trees
- straightforward control flow

Weaknesses:

- can introduce repeated dispatch overhead

### Push-based execution

Child operators push data to parents or downstream consumers.

Strengths:

- natural for streaming or event-driven systems
- can work well with pipeline fusion

Weaknesses:

- control flow can be harder to reason about

Many systems combine these ideas rather than choosing only one.

---

## Pipelining vs materialization

### Pipelined execution

Operators pass intermediate results incrementally.

Strengths:

- low latency
- less temporary storage in favorable cases

Weaknesses:

- some operators still create barriers

### Materializing execution

An operator stores its entire output before the next operator consumes it.

Strengths:

- simpler boundaries
- easier reuse of intermediates

Weaknesses:

- more memory and I/O cost
- higher latency

---

## Blocking operators

Some operators are naturally blocking.

Examples:

- sort
- some aggregates
- some join strategies

These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.

---

## Practical mental model

Execution models are about runtime granularity and data flow.

If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"

---

## Changelog

* **April 1, 2026** -- First version created.