useful-notes/hqew/006-query-execution-models.md

3.1 KiB

Query Execution Models

A reference for the main ways query operators run at runtime.


Short answer

An execution model defines how operators consume input, produce output, and pass data through a plan.

The most important questions are:

  • one row at a time or many values at once?
  • pull-based or push-based?
  • pipelined or materialized?

Those choices strongly affect latency, CPU efficiency, and implementation complexity.


Row-at-a-time execution

In a row-oriented model, operators process one tuple at a time.

This is often implemented with an iterator interface where a parent asks a child for the next row.

Strengths:

  • simple
  • modular
  • easy to debug

Weaknesses:

  • high per-row overhead
  • worse cache behavior for analytics

This model is historically important and still useful in many systems.


Batch-oriented execution

In a batch model, operators process chunks of rows together.

The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.

Strengths:

  • better CPU efficiency
  • lower dispatch overhead
  • easier parallelism inside an operator

Weaknesses:

  • more bookkeeping
  • more complex control flow

Vectorized execution

Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.

This fits well with columnar memory layouts and analytical workloads.

Strengths:

  • excellent cache locality
  • better SIMD opportunities
  • good fit for scans, filters, joins, and aggregates

Weaknesses:

  • some control-flow-heavy logic is less natural
  • more careful null and type handling is needed

Pull vs push

Pull-based execution

Parent operators ask children for data.

Strengths:

  • natural operator trees
  • straightforward control flow

Weaknesses:

  • can introduce repeated dispatch overhead

Push-based execution

Child operators push data to parents or downstream consumers.

Strengths:

  • natural for streaming or event-driven systems
  • can work well with pipeline fusion

Weaknesses:

  • control flow can be harder to reason about

Many systems combine these ideas rather than choosing only one.


Pipelining vs materialization

Pipelined execution

Operators pass intermediate results incrementally.

Strengths:

  • low latency
  • less temporary storage in favorable cases

Weaknesses:

  • some operators still create barriers

Materializing execution

An operator stores its entire output before the next operator consumes it.

Strengths:

  • simpler boundaries
  • easier reuse of intermediates

Weaknesses:

  • more memory and I/O cost
  • higher latency

Blocking operators

Some operators are naturally blocking.

Examples:

  • sort
  • some aggregates
  • some join strategies

These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.


Practical mental model

Execution models are about runtime granularity and data flow.

If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"


Changelog

  • April 1, 2026 -- First version created.