393 lines
10 KiB
Markdown
393 lines
10 KiB
Markdown
|
|
# Query Engine Architectures
|
||
|
|
|
||
|
|
A reference for the main ways query engines are structured internally.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Short answer
|
||
|
|
|
||
|
|
There is no single query engine architecture.
|
||
|
|
|
||
|
|
Most engines are built by making choices along a few major axes:
|
||
|
|
|
||
|
|
- row-oriented vs columnar
|
||
|
|
- interpreted vs compiled
|
||
|
|
- pipelined vs materializing
|
||
|
|
- centralized vs distributed
|
||
|
|
- tightly coupled to storage vs separated through a source interface
|
||
|
|
|
||
|
|
Different systems pick different combinations depending on whether they optimize for transactional access, analytics, low latency, concurrency,
|
||
|
|
distribution, or implementation simplicity.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## A useful mental model
|
||
|
|
|
||
|
|
When people say "query engine architecture," they usually mean the answer to questions like:
|
||
|
|
|
||
|
|
- what is the unit of execution: row, batch, or whole relation?
|
||
|
|
- how do operators communicate: pull, push, or staged pipelines?
|
||
|
|
- when are intermediates materialized?
|
||
|
|
- where does optimization happen?
|
||
|
|
- how tightly is execution tied to the underlying storage?
|
||
|
|
- does one machine run the whole plan, or is the plan split across many workers?
|
||
|
|
|
||
|
|
So architecture is less about one named pattern and more about a set of design choices.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## The classic row-at-a-time iterator model
|
||
|
|
|
||
|
|
One of the most common traditional designs is the Volcano-style iterator model.
|
||
|
|
|
||
|
|
In that architecture:
|
||
|
|
|
||
|
|
- each operator exposes a method like `next()`
|
||
|
|
- parent operators pull one tuple at a time from child operators
|
||
|
|
- data flows upward through a tree of operators
|
||
|
|
|
||
|
|
Typical shape:
|
||
|
|
|
||
|
|
1. `Projection`
|
||
|
|
2. pulls from `Filter`
|
||
|
|
3. which pulls from `Scan`
|
||
|
|
|
||
|
|
### Strengths
|
||
|
|
|
||
|
|
- simple and modular
|
||
|
|
- easy to understand and implement
|
||
|
|
- operators compose cleanly
|
||
|
|
- works well for many relational operators
|
||
|
|
|
||
|
|
### Weaknesses
|
||
|
|
|
||
|
|
- pays per-row virtual-call or dispatch overhead
|
||
|
|
- poor cache locality for analytical scans
|
||
|
|
- not ideal for SIMD or vectorized processing
|
||
|
|
|
||
|
|
This architecture was historically very influential, but many modern analytical engines move away from pure row-at-a-time execution.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Vectorized and batch-oriented engines
|
||
|
|
|
||
|
|
Modern analytical engines often process batches of rows, frequently in columnar form.
|
||
|
|
|
||
|
|
Instead of asking for one row at a time, an operator processes a chunk such as a `RecordBatch` containing many values per column.
|
||
|
|
|
||
|
|
### Strengths
|
||
|
|
|
||
|
|
- lower per-row overhead
|
||
|
|
- better cache behavior
|
||
|
|
- fits columnar storage and in-memory formats well
|
||
|
|
- makes vectorized computation practical
|
||
|
|
|
||
|
|
### Weaknesses
|
||
|
|
|
||
|
|
- more complex operator implementations
|
||
|
|
- more bookkeeping around batch boundaries
|
||
|
|
- some control-flow-heavy operations fit less naturally
|
||
|
|
|
||
|
|
This style is common in engines optimized for scans, filters, aggregations, and joins over large datasets.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Interpreted vs compiled execution
|
||
|
|
|
||
|
|
Another major architecture decision is whether plans are interpreted directly or turned into specialized executable code.
|
||
|
|
|
||
|
|
### Interpreted execution
|
||
|
|
|
||
|
|
The engine walks the plan at runtime and runs generic operator implementations.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- easier to build
|
||
|
|
- easier to debug
|
||
|
|
- more flexible for dynamic plans
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- more dispatch overhead
|
||
|
|
- fewer opportunities for low-level specialization
|
||
|
|
|
||
|
|
### Compiled or code-generated execution
|
||
|
|
|
||
|
|
The engine turns expressions or even whole pipelines into specialized machine code, bytecode, or generated source.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- less runtime interpretation overhead
|
||
|
|
- more aggressive specialization
|
||
|
|
- can fuse multiple operations together
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- much more implementation complexity
|
||
|
|
- longer compile/startup path
|
||
|
|
- harder debugging and observability
|
||
|
|
|
||
|
|
Many real systems are hybrid: interpreted at some layers, compiled at others.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Pipelined vs materializing architectures
|
||
|
|
|
||
|
|
Query engines also differ in when they store intermediate results.
|
||
|
|
|
||
|
|
### Pipelined execution
|
||
|
|
|
||
|
|
Operators stream data incrementally from one stage to the next.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- lower latency
|
||
|
|
- less memory for some workloads
|
||
|
|
- natural for streaming or interactive execution
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- harder to reuse intermediates
|
||
|
|
- blocking operators still force barriers
|
||
|
|
|
||
|
|
### Materializing execution
|
||
|
|
|
||
|
|
The engine fully computes an intermediate result before the next stage uses it.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- simpler control flow
|
||
|
|
- easier fault recovery in some systems
|
||
|
|
- easier sharing of intermediates
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- more memory and I/O overhead
|
||
|
|
- higher latency
|
||
|
|
|
||
|
|
Real engines mix both. For example, filter and projection may pipeline, while sort and some aggregates necessarily materialize or partially buffer.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Row stores, column stores, and storage coupling
|
||
|
|
|
||
|
|
Some engines are tightly integrated with one storage layout. Others sit above many storage systems.
|
||
|
|
|
||
|
|
### Storage-coupled engines
|
||
|
|
|
||
|
|
In these systems, the executor is deeply aware of the storage engine's pages, indexes, and on-disk layout.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- can exploit detailed storage-level knowledge
|
||
|
|
- often strong performance for the target workload
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- harder to adapt to new sources
|
||
|
|
- architecture is less modular
|
||
|
|
|
||
|
|
### Storage-decoupled engines
|
||
|
|
|
||
|
|
In these systems, execution interacts with storage through a cleaner source boundary such as scan interfaces and pushdowns.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- easier support for many formats and backends
|
||
|
|
- cleaner separation of concerns
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- some storage-specific optimizations become harder
|
||
|
|
- abstraction can hide useful low-level details
|
||
|
|
|
||
|
|
This is a major difference between database kernels and more modular data-processing engines.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Centralized vs distributed architectures
|
||
|
|
|
||
|
|
Some engines run entirely inside one process or one machine. Others split work across nodes.
|
||
|
|
|
||
|
|
### Centralized engines
|
||
|
|
|
||
|
|
All planning and execution happens locally.
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- much simpler design
|
||
|
|
- easier debugging
|
||
|
|
- lower coordination overhead
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- limited by one machine's CPU, memory, and I/O
|
||
|
|
|
||
|
|
### Distributed engines
|
||
|
|
|
||
|
|
The plan is divided into stages or fragments that run on multiple workers.
|
||
|
|
|
||
|
|
Common added components include:
|
||
|
|
|
||
|
|
- exchange or shuffle operators
|
||
|
|
- task schedulers
|
||
|
|
- remote data transport
|
||
|
|
- fault handling and retries
|
||
|
|
|
||
|
|
Strengths:
|
||
|
|
|
||
|
|
- can scale to much larger datasets
|
||
|
|
- can exploit cluster parallelism
|
||
|
|
|
||
|
|
Weaknesses:
|
||
|
|
|
||
|
|
- much more complexity
|
||
|
|
- network and serialization overhead
|
||
|
|
- harder optimization problem
|
||
|
|
|
||
|
|
Distributed query engines are not just "single-node engines on more machines." They need an additional architecture for task placement and data
|
||
|
|
movement.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Blocking vs non-blocking operators
|
||
|
|
|
||
|
|
Operator behavior also shapes architecture.
|
||
|
|
|
||
|
|
### Non-blocking operators
|
||
|
|
|
||
|
|
These can start producing output before consuming all input.
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
|
||
|
|
- scan
|
||
|
|
- filter
|
||
|
|
- projection
|
||
|
|
|
||
|
|
### Blocking operators
|
||
|
|
|
||
|
|
These require all or much of the input before producing useful output.
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
|
||
|
|
- sort
|
||
|
|
- some aggregates
|
||
|
|
- some join strategies
|
||
|
|
|
||
|
|
Blocking operators create execution barriers and often determine where memory pressure and latency appear in the engine.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Common architecture patterns
|
||
|
|
|
||
|
|
### 1. Classic relational database kernel
|
||
|
|
|
||
|
|
Typical traits:
|
||
|
|
|
||
|
|
- row-oriented execution
|
||
|
|
- iterator model
|
||
|
|
- tight coupling to storage and indexes
|
||
|
|
- strong support for transactional workloads
|
||
|
|
|
||
|
|
### 2. Modern analytical columnar engine
|
||
|
|
|
||
|
|
Typical traits:
|
||
|
|
|
||
|
|
- columnar in-memory representation
|
||
|
|
- batch or vectorized execution
|
||
|
|
- pushdown into file formats or source connectors
|
||
|
|
- optimized for scans, joins, and aggregates on large datasets
|
||
|
|
|
||
|
|
### 3. Distributed SQL or lakehouse engine
|
||
|
|
|
||
|
|
Typical traits:
|
||
|
|
|
||
|
|
- logical and physical planning on a coordinator
|
||
|
|
- distributed execution stages on workers
|
||
|
|
- shuffle or exchange boundaries
|
||
|
|
- object storage or remote tables as sources
|
||
|
|
|
||
|
|
### 4. Streaming query engine
|
||
|
|
|
||
|
|
Typical traits:
|
||
|
|
|
||
|
|
- push-style or event-driven data flow
|
||
|
|
- unbounded inputs
|
||
|
|
- windows, watermarks, and incremental state
|
||
|
|
- low-latency continuous execution
|
||
|
|
|
||
|
|
### 5. Logic or rule engine
|
||
|
|
|
||
|
|
Typical traits:
|
||
|
|
|
||
|
|
- facts and rules instead of only relational algebra
|
||
|
|
- recursion and fixpoint evaluation
|
||
|
|
- unification, derivation, or chase-like execution
|
||
|
|
- architecture organized around inference as much as scanning
|
||
|
|
|
||
|
|
This last category matters when the "query engine" is not mainly about SQL tables at all.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Comparison table
|
||
|
|
|
||
|
|
| Architecture style | Typical unit of work | Best for | Main weakness |
|
||
|
|
|:-------------------|:---------------------|:---------|:--------------|
|
||
|
|
| Row iterator | one row | simplicity, OLTP-style access | high per-row overhead |
|
||
|
|
| Vectorized columnar| batch / column chunk | analytics | more implementation complexity |
|
||
|
|
| Materializing | full intermediate | simpler barriers and reuse | memory and latency cost |
|
||
|
|
| Pipelined | incremental stream | low latency | harder coordination around blocking operators |
|
||
|
|
| Distributed | stage / partition | scale-out analytics | network and scheduling complexity |
|
||
|
|
| Rule/fixpoint | fact/rule step | recursion and inference | different optimization model from relational SQL |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## How to think about architectural fit
|
||
|
|
|
||
|
|
A good architecture depends on what the engine is for.
|
||
|
|
|
||
|
|
If the workload is mostly:
|
||
|
|
|
||
|
|
- point lookups and updates, row-oriented designs can be reasonable
|
||
|
|
- large scans and aggregates, vectorized columnar designs usually win
|
||
|
|
- recursive inference or logical closure, rule/fixpoint architectures become central
|
||
|
|
- cluster-scale datasets, distributed planning and exchange operators become unavoidable
|
||
|
|
|
||
|
|
So the architecture should follow the workload, not the other way around.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## A practical summary
|
||
|
|
|
||
|
|
If you need the shortest useful summary, this is a good one:
|
||
|
|
|
||
|
|
> Query engine architecture is mostly about execution granularity, operator communication, intermediate storage, and the boundary with storage and
|
||
|
|
> distribution.
|
||
|
|
|
||
|
|
Or more concretely:
|
||
|
|
|
||
|
|
- row vs batch
|
||
|
|
- interpreted vs compiled
|
||
|
|
- pipelined vs materialized
|
||
|
|
- local vs distributed
|
||
|
|
- relational vs rule-oriented execution
|
||
|
|
|
||
|
|
Those choices explain most of the important differences between engines.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Questions to keep in mind
|
||
|
|
|
||
|
|
- What is the unit of execution?
|
||
|
|
- Which operators are blocking?
|
||
|
|
- How much is pushed down to the data source?
|
||
|
|
- Is storage tightly coupled or abstracted behind connectors?
|
||
|
|
- Does the workload look more transactional, analytical, streaming, or logical?
|
||
|
|
- Is distribution a first-order requirement or premature complexity?
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Changelog
|
||
|
|
|
||
|
|
* **Mar 31, 2026** -- First version created.
|