habedi-work/useful-notes

Fork 0

Hassan Abedi 584f82bb82 Add a note file about query engine architectures

2026-04-01 09:12:33 +02:00

10 KiB

Raw Blame History

Query Engine Architectures

A reference for the main ways query engines are structured internally.

Short answer

There is no single query engine architecture.

Most engines are built by making choices along a few major axes:

row-oriented vs columnar
interpreted vs compiled
pipelined vs materializing
centralized vs distributed
tightly coupled to storage vs separated through a source interface

Different systems pick different combinations depending on whether they optimize for transactional access, analytics, low latency, concurrency, distribution, or implementation simplicity.

A useful mental model

When people say "query engine architecture," they usually mean the answer to questions like:

what is the unit of execution: row, batch, or whole relation?
how do operators communicate: pull, push, or staged pipelines?
when are intermediates materialized?
where does optimization happen?
how tightly is execution tied to the underlying storage?
does one machine run the whole plan, or is the plan split across many workers?

So architecture is less about one named pattern and more about a set of design choices.

The classic row-at-a-time iterator model

One of the most common traditional designs is the Volcano-style iterator model.

In that architecture:

each operator exposes a method like next()
parent operators pull one tuple at a time from child operators
data flows upward through a tree of operators

Typical shape:

Projection
pulls from Filter
which pulls from Scan

Strengths

simple and modular
easy to understand and implement
operators compose cleanly
works well for many relational operators

Weaknesses

pays per-row virtual-call or dispatch overhead
poor cache locality for analytical scans
not ideal for SIMD or vectorized processing

This architecture was historically very influential, but many modern analytical engines move away from pure row-at-a-time execution.

Vectorized and batch-oriented engines

Modern analytical engines often process batches of rows, frequently in columnar form.

Instead of asking for one row at a time, an operator processes a chunk such as a RecordBatch containing many values per column.

Strengths

lower per-row overhead
better cache behavior
fits columnar storage and in-memory formats well
makes vectorized computation practical

Weaknesses

more complex operator implementations
more bookkeeping around batch boundaries
some control-flow-heavy operations fit less naturally

This style is common in engines optimized for scans, filters, aggregations, and joins over large datasets.

Interpreted vs compiled execution

Another major architecture decision is whether plans are interpreted directly or turned into specialized executable code.

Interpreted execution

The engine walks the plan at runtime and runs generic operator implementations.

Strengths:

easier to build
easier to debug
more flexible for dynamic plans

Weaknesses:

more dispatch overhead
fewer opportunities for low-level specialization

Compiled or code-generated execution

The engine turns expressions or even whole pipelines into specialized machine code, bytecode, or generated source.

Strengths:

less runtime interpretation overhead
more aggressive specialization
can fuse multiple operations together

Weaknesses:

much more implementation complexity
longer compile/startup path
harder debugging and observability

Many real systems are hybrid: interpreted at some layers, compiled at others.

Pipelined vs materializing architectures

Query engines also differ in when they store intermediate results.

Pipelined execution

Operators stream data incrementally from one stage to the next.

Strengths:

lower latency
less memory for some workloads
natural for streaming or interactive execution

Weaknesses:

harder to reuse intermediates
blocking operators still force barriers

Materializing execution

The engine fully computes an intermediate result before the next stage uses it.

Strengths:

simpler control flow
easier fault recovery in some systems
easier sharing of intermediates

Weaknesses:

more memory and I/O overhead
higher latency

Real engines mix both. For example, filter and projection may pipeline, while sort and some aggregates necessarily materialize or partially buffer.

Row stores, column stores, and storage coupling

Some engines are tightly integrated with one storage layout. Others sit above many storage systems.

Storage-coupled engines

In these systems, the executor is deeply aware of the storage engine's pages, indexes, and on-disk layout.

Strengths:

can exploit detailed storage-level knowledge
often strong performance for the target workload

Weaknesses:

harder to adapt to new sources
architecture is less modular

Storage-decoupled engines

In these systems, execution interacts with storage through a cleaner source boundary such as scan interfaces and pushdowns.

Strengths:

easier support for many formats and backends
cleaner separation of concerns

Weaknesses:

some storage-specific optimizations become harder
abstraction can hide useful low-level details

This is a major difference between database kernels and more modular data-processing engines.

Centralized vs distributed architectures

Some engines run entirely inside one process or one machine. Others split work across nodes.

Centralized engines

All planning and execution happens locally.

Strengths:

much simpler design
easier debugging
lower coordination overhead

Weaknesses:

limited by one machine's CPU, memory, and I/O

Distributed engines

The plan is divided into stages or fragments that run on multiple workers.

Common added components include:

exchange or shuffle operators
task schedulers
remote data transport
fault handling and retries

Strengths:

can scale to much larger datasets
can exploit cluster parallelism

Weaknesses:

much more complexity
network and serialization overhead
harder optimization problem

Distributed query engines are not just "single-node engines on more machines." They need an additional architecture for task placement and data movement.

Blocking vs non-blocking operators

Operator behavior also shapes architecture.

Non-blocking operators

These can start producing output before consuming all input.

Examples:

scan
filter
projection

Blocking operators

These require all or much of the input before producing useful output.

Examples:

sort
some aggregates
some join strategies

Blocking operators create execution barriers and often determine where memory pressure and latency appear in the engine.

Common architecture patterns

1. Classic relational database kernel

Typical traits:

row-oriented execution
iterator model
tight coupling to storage and indexes
strong support for transactional workloads

2. Modern analytical columnar engine

Typical traits:

columnar in-memory representation
batch or vectorized execution
pushdown into file formats or source connectors
optimized for scans, joins, and aggregates on large datasets

3. Distributed SQL or lakehouse engine

Typical traits:

logical and physical planning on a coordinator
distributed execution stages on workers
shuffle or exchange boundaries
object storage or remote tables as sources

4. Streaming query engine

Typical traits:

push-style or event-driven data flow
unbounded inputs
windows, watermarks, and incremental state
low-latency continuous execution

5. Logic or rule engine

Typical traits:

facts and rules instead of only relational algebra
recursion and fixpoint evaluation
unification, derivation, or chase-like execution
architecture organized around inference as much as scanning

This last category matters when the "query engine" is not mainly about SQL tables at all.

Comparison table

Architecture style	Typical unit of work	Best for	Main weakness
Row iterator	one row	simplicity, OLTP-style access	high per-row overhead
Vectorized columnar	batch / column chunk	analytics	more implementation complexity
Materializing	full intermediate	simpler barriers and reuse	memory and latency cost
Pipelined	incremental stream	low latency	harder coordination around blocking operators
Distributed	stage / partition	scale-out analytics	network and scheduling complexity
Rule/fixpoint	fact/rule step	recursion and inference	different optimization model from relational SQL

How to think about architectural fit

A good architecture depends on what the engine is for.

If the workload is mostly:

point lookups and updates, row-oriented designs can be reasonable
large scans and aggregates, vectorized columnar designs usually win
recursive inference or logical closure, rule/fixpoint architectures become central
cluster-scale datasets, distributed planning and exchange operators become unavoidable

So the architecture should follow the workload, not the other way around.

A practical summary

If you need the shortest useful summary, this is a good one:

Query engine architecture is mostly about execution granularity, operator communication, intermediate storage, and the boundary with storage and distribution.

Or more concretely:

row vs batch
interpreted vs compiled
pipelined vs materialized
local vs distributed
relational vs rule-oriented execution

Those choices explain most of the important differences between engines.

Questions to keep in mind

What is the unit of execution?
Which operators are blocking?
How much is pushed down to the data source?
Is storage tightly coupled or abstracted behind connectors?
Does the workload look more transactional, analytical, streaming, or logical?
Is distribution a first-order requirement or premature complexity?

Changelog

Mar 31, 2026 -- First version created.

10 KiB Raw Blame History

Query Engine Architectures

Short answer

A useful mental model

The classic row-at-a-time iterator model

Strengths

Weaknesses

Vectorized and batch-oriented engines

Strengths

Weaknesses

Interpreted vs compiled execution

Interpreted execution

Compiled or code-generated execution

Pipelined vs materializing architectures

Pipelined execution

Materializing execution

Row stores, column stores, and storage coupling

Storage-coupled engines

Storage-decoupled engines

Centralized vs distributed architectures

Centralized engines

Distributed engines

Blocking vs non-blocking operators

Non-blocking operators

Blocking operators

Common architecture patterns

1. Classic relational database kernel

2. Modern analytical columnar engine

3. Distributed SQL or lakehouse engine

4. Streaming query engine

5. Logic or rule engine

Comparison table

How to think about architectural fit

A practical summary

Questions to keep in mind

Changelog

10 KiB

Raw Blame History