Add notes files on physical query planning and evaluation

This commit is contained in:
Hassan Abedi 2026-04-07 14:40:37 +02:00
parent 13d4f9e638
commit b2aec58a85
2 changed files with 762 additions and 0 deletions

View File

@ -0,0 +1,378 @@
# Physical Plans and Operators
A reference for how a logical query plan becomes executable runtime work.
---
## Short answer
The physical plan is the runtime form of the query.
Logical plans say:
- what operations are required
Physical plans say:
- which executable operators will run
- what inputs they consume
- what batches they produce
- how logical expressions are turned into evaluable runtime expressions
In the companion engine, the physical layer is intentionally direct:
- each logical node maps to a concrete executable operator
- execution produces `Sequence<RecordBatch>`
- operators are mostly batch-oriented and columnar
That makes the logical-to-physical boundary easy to study.
---
## The physical planning pipeline
Relevant modules:
- `tmp/how-query-engines-work/query-planner/src/main/kotlin/QueryPlanner.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/PhysicalPlan.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ScanExec.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/SelectionExec.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ProjectionExec.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashAggregateExec.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashJoinExec.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/LimitExec.kt`
Conceptually:
```text
logical plan
-> choose concrete executable operators
-> translate logical expressions into physical expressions
-> execute operators over record batches
```
---
## What a physical plan is
The `PhysicalPlan` interface defines three core things:
- `schema()`: the shape of produced output
- `execute()`: run the operator and emit `RecordBatch` values
- `children()`: the operator inputs
This is a clean and important boundary.
The physical layer is no longer about parsing or language semantics. It is about executable dataflow over batches.
---
## The planner's job
`QueryPlanner.createPhysicalPlan(plan)` recursively maps each logical operator to a runtime operator.
The mappings are intentionally readable:
- `Scan` -> `ScanExec`
- `Selection` -> `SelectionExec`
- `Projection` -> `ProjectionExec`
- `Aggregate` -> `HashAggregateExec`
- `Join` -> `HashJoinExec`
- `Limit` -> `LimitExec`
At the same time, it translates logical expressions into runtime expressions.
This means physical planning has two distinct jobs:
1. choose operator implementations
2. lower logical expressions into executable expression evaluators
That split is worth keeping explicit in any engine design.
---
## Scan execution
`ScanExec` is the leaf physical operator.
Its job is simple:
- expose the projected schema
- delegate scanning to the underlying `DataSource`
- rely on the source to produce batches
This is also where projection pushdown becomes concrete. The operator receives a list of projected columns and passes that projection down to the source.
So even though `ScanExec` is small, it carries important architectural meaning:
- it is the storage boundary
- it is where source capabilities first affect runtime behavior
- it is how the physical plan begins consuming external data
---
## Projection execution
`ProjectionExec` runs a list of physical expressions over each input batch.
For each incoming `RecordBatch`, it:
1. evaluates each projection expression
2. collects the resulting output columns
3. builds a new `RecordBatch` with the projection schema
This captures the batch-oriented style clearly:
- expressions do not produce one value
- they produce a whole output column for the batch
That is a core mental shift in vectorized engines.
---
## Selection execution
`SelectionExec` evaluates a predicate expression against a batch and expects the result to be a boolean Arrow `BitVector`.
It then:
1. counts selected rows
2. allocates filtered output vectors
3. copies only rows whose predicate bit is true
4. returns a filtered batch with the same schema
This is a useful example of how vectorized filtering really works:
- the predicate is first materialized as a boolean selection vector
- then each input column is filtered using that vector
So selection is not "loop through rows and emit rows" at the conceptual boundary. It is "evaluate a batch predicate, then compact batch columns."
---
## Hash aggregation
`HashAggregateExec` is the aggregate runtime operator.
It keeps a map from:
- group key
to:
- a list of aggregate accumulators
Its runtime shape is:
1. evaluate grouping expressions on the batch
2. evaluate aggregate input expressions on the batch
3. iterate through rows
4. look up or create accumulator state for the row's grouping key
5. update accumulators
6. after all input is consumed, build one output batch from the accumulated state
This is a blocking operator:
- it must inspect all input before final results are ready
That matters because the physical plan may be largely pipelined, but some operators introduce a barrier.
---
## Partial and final aggregation
The aggregate operator also supports different modes:
- `COMPLETE`
- `PARTIAL`
- `FINAL`
This is important because it shows how distributed execution reuses the same conceptual operator in multiple stages.
The idea is:
- `PARTIAL`: produce intermediate aggregate state locally
- `FINAL`: merge intermediate states into final answers
Average is the clearest example.
You cannot merge two averages by averaging the averages. You need both:
- partial sum
- partial count
The companion code models that explicitly through `AvgIntermediateState`.
That is a good example of physical planning and execution needing richer runtime state than the SQL surface suggests.
---
## Hash join
`HashJoinExec` implements joins with a classic build/probe strategy.
It:
1. fully reads the right input
2. builds a hash table keyed by the right join columns
3. streams through the left input
4. probes the hash table for matches
5. constructs joined rows
For left joins, it emits unmatched left rows with null-filled right columns.
For right joins, it performs extra work after probing to emit unmatched right rows.
Two practical details are worth noting.
First, the planner resolves join column names to physical column indices ahead of execution. Runtime code wants positions, not symbolic names.
Second, duplicate key columns from the right side may be excluded from the output schema when the join keys have the same name. That is a schema-shaping concern the planner and operator must agree on.
---
## Limit execution
`LimitExec` is simple but instructive.
It streams input batches until the requested number of rows has been produced. If the limit falls in the middle of a batch, it truncates that batch and stops.
This shows that even "tiny" operators still need batch-aware logic:
- whole-batch pass-through when possible
- partial-batch copying when necessary
So batch-oriented execution changes the shape of even very simple operators.
---
## Expression lowering at the physical boundary
The query planner also lowers logical expressions into physical expressions.
Examples:
- `Column(name)` -> `ColumnExpression(index)`
- `LiteralLong` -> `LiteralLongExpression`
- `Eq` -> `EqExpression`
- `Add` -> `AddExpression`
- `CastExpr` -> `CastExpression`
One important detail is that aliases disappear at this stage.
That is correct because aliases affect:
- names
- schema
- planning
They do not affect the actual runtime computation.
This is a good example of semantics getting split cleanly across layers.
---
## Operator tree and dataflow
The physical plan is still a tree of operators, but now it is executable.
A simple query such as:
```sql
SELECT name
FROM employees
WHERE age > 18
LIMIT 10
```
typically becomes something like:
1. `ScanExec`
2. `SelectionExec`
3. `ProjectionExec`
4. `LimitExec`
Each operator consumes batches from its child and emits new batches upward.
That is the operational heart of the engine.
---
## Blocking vs streaming behavior
The physical layer makes it obvious which operators can stream and which cannot.
Mostly streaming:
- scan
- projection
- selection
- limit
Blocking or partially blocking:
- hash aggregate
- hash join build phase
- right-join unmatched-row handling
This distinction matters because runtime latency and memory behavior depend as much on operator blocking properties as on logical operator choice.
---
## Strengths of the companion design
The companion physical layer is especially useful as a teaching architecture because:
- the logical-to-physical mapping is easy to see
- operators are concrete and local
- batches are the universal runtime unit
- runtime expressions are clearly separate from logical expressions
- distributed extensions can reuse the same concepts
It is not trying to be the most optimized design possible. It is trying to make the moving parts visible.
That is why it works well as a study target.
---
## What is simplified relative to a production engine
A production engine would likely add:
- richer operator selection
- more cost-sensitive physical planning
- more join strategies
- more sophisticated memory management
- spill-to-disk behavior
- stronger predicate pushdown and source capabilities
- more operator fusion and code generation
So this note should be read as a physical-plan primer, not as the full space of execution-engine design.
---
## Main takeaways
- Physical planning chooses executable operators and lowers expressions into runtime evaluators.
- Batch-oriented execution changes the shape of every operator, even simple ones like `LIMIT` and `FILTER`.
- Aggregation and joins expose the importance of blocking behavior, runtime state, and schema shaping.
- Planner work continues beyond logical planning: it also resolves indices, output schemas, and runtime operator contracts.
- The physical layer is where abstract query meaning finally turns into actual dataflow.
---
## Related notes
- `hqew/002-query-engine-primer.md`
- `hqew/003-query-engine-architectures.md`
- `hqew/006-query-execution-models.md`
- `hqew/008-joins-and-aggregations.md`
- `hqew/009-distributed-query-engines.md`
- `hqew/015-sql-front-end-and-logical-planning.md`
- `hqew/017-expressions-types-and-nullability.md`
---
## Changelog
* **Apr 7, 2026** -- Added a dedicated note on physical planning, executable operators, and batch-oriented runtime behavior.

View File

@ -0,0 +1,384 @@
# Expressions, Types, and Nullability
A reference for how query engines represent computations over data and why types and nulls shape both planning and execution.
---
## Short answer
Expressions are the smallest units of query computation.
Operators such as projection, filter, join, and aggregate do not do useful work by themselves. They rely on expressions to answer questions like:
- which column is being referenced?
- what value does this literal represent?
- how are two values compared?
- how is a new value computed?
- what type does the result have?
- how should nulls be handled?
In the companion engine, expressions exist in two layers:
- logical expressions for planning
- physical expressions for execution
Types and nullability matter in both layers.
---
## Why expressions deserve their own note
It is easy to talk about query engines only in terms of operators:
- scan
- filter
- join
- aggregate
But operators are really containers for expression evaluation.
For example:
- a filter is only meaningful because it evaluates a boolean predicate
- a projection is only meaningful because it evaluates output expressions
- an aggregate is only meaningful because it evaluates aggregate inputs and grouping keys
So expressions are not side details. They are a large part of the engine's real semantics.
---
## The two-layer model
Relevant modules:
- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/LogicalExpr.kt`
- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/Expressions.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/Expressions.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/BinaryExpression.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/CastExpression.kt`
- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/AggregateExpression.kt`
- `tmp/how-query-engines-work/datatypes/src/main/kotlin/Schema.kt`
- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ColumnVector.kt`
- `tmp/how-query-engines-work/datatypes/src/main/kotlin/RecordBatch.kt`
- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ArrowTypes.kt`
The key distinction is:
- logical expressions answer planning questions
- physical expressions answer runtime evaluation questions
Logical expressions expose:
- names
- output types
- output fields
- planning-time structure
Physical expressions expose:
- `evaluate(input: RecordBatch): ColumnVector`
That is a clean and important design split.
---
## Logical expressions
Logical expressions represent computation before execution details are fixed.
Examples include:
- `Column`
- `ColumnIndex`
- `LiteralString`
- `LiteralLong`
- `LiteralDouble`
- `LiteralDate`
- `LiteralIntervalDays`
- `CastExpr`
- `Eq`, `Gt`, `LtEq`
- `And`, `Or`, `Not`
- `Add`, `Subtract`, `Multiply`, `Divide`
- aggregate expressions such as `Sum`, `Avg`, `Count`
- aliasing via `Alias`
The defining feature of a logical expression is `toField(input)`.
That means the expression can answer:
- what field name will this produce?
- what data type will it produce?
So logical expressions are where planning gets its schema information.
---
## Physical expressions
Physical expressions are executable.
They evaluate over an input `RecordBatch` and produce a `ColumnVector`.
That means even a scalar-looking expression such as:
```sql
age + 1
```
does not produce one scalar at runtime. It produces a full output vector for the batch.
Examples of physical expressions in the companion engine:
- `ColumnExpression`
- literal expressions
- comparison expressions
- boolean expressions
- arithmetic expressions
- `CastExpression`
- date/interval expressions
- aggregate runtime expressions via accumulator-based machinery
This is one of the clearest signs that the engine is batch-oriented rather than row-oriented.
---
## The runtime currency: columns and batches
Expression evaluation works because the runtime model is built around:
- `Schema`
- `Field`
- `ColumnVector`
- `RecordBatch`
`Schema` holds the list of fields.
Each `Field` has:
- a name
- an Arrow data type
`ColumnVector` provides:
- its Arrow type
- random access to values
- a length
`RecordBatch` groups equal-length column vectors under one schema.
So the actual runtime unit of expression evaluation is not "a row object." It is "a batch plus one or more vectors derived from it."
---
## Why types matter in planning
Types matter at planning time because the engine needs to know whether expressions make sense before it tries to run them.
Examples:
- a referenced column must exist in the input schema
- a cast target must be supported
- a comparison should have operands with compatible meaning
- an aggregate result has a particular output type
The companion engine uses Arrow types as the type vocabulary.
That is a strong design choice because it avoids inventing a parallel type universe disconnected from the runtime representation.
So when planning computes a `Field`, it is already aligning:
- expression semantics
- schema metadata
- runtime representation
---
## Why types matter in execution
At runtime, types affect:
- which physical vectors get allocated
- how values are read and written
- what coercions are legal
- what aggregate logic is valid
- how nulls are represented
For example:
- `CastExpression` allocates an output vector of the requested Arrow type
- selection expects a boolean `BitVector`
- aggregate accumulators dispatch on numeric runtime types
- numeric binary expressions may coerce mismatched numeric inputs to `Double`
So types are not paperwork. They determine concrete execution behavior.
---
## Type coercion
The companion engine has limited but explicit type coercion behavior.
One notable example is `BinaryExpression`:
- evaluate left vector
- evaluate right vector
- if their Arrow types differ
- try to coerce both to `Double` when both are numeric
This is a pragmatic runtime decision.
It shows two things clearly:
- expression evaluation often needs type-reconciliation logic
- even a small engine needs rules for mixed-type arithmetic and comparison
It also shows a current simplification: coercion is narrow and mostly numeric. A production engine would have a broader, more principled coercion matrix.
---
## Casting
`CastExpression` makes type conversion explicit.
It supports a range of targets such as:
- integer widths
- floating-point types
- string
At runtime it:
1. evaluates the source expression to a column vector
2. allocates a destination vector of the target type
3. converts each value
4. preserves nulls rather than coercing them into non-null sentinel values
That last point matters. Nulls are not just a parsing concern. They must survive evaluation correctly through type conversion.
---
## Aggregate expressions and accumulators
Aggregates are special because their runtime behavior is stateful.
The physical layer separates two concerns:
- an `AggregateExpression` knows its input expression and how to create an accumulator
- an `Accumulator` maintains per-group state across many rows
Examples:
- `CountAccumulator` counts non-null inputs
- `SumAccumulator` adds values using type-dependent numeric logic
- `AvgAccumulator` tracks both sum and count
This design is important because aggregate evaluation is not just "run an expression on a batch." It is "maintain state across many batch values."
---
## Nullability in practice
The notes in `hqew/001` and `hqew/014` already say nullability matters. The code shows why.
Null handling appears in several concrete ways:
- CSV readers call `setNull(...)` when cells are empty
- cast logic preserves nulls
- `COUNT` increments only for non-null values
- arithmetic and aggregate logic must decide what to do when values are null
- Arrow vectors use validity information to distinguish null from present values
That means nullability is part of:
- source ingestion
- expression semantics
- aggregate semantics
- output materialization
It is not just an annotation on a schema.
---
## Three-valued logic and current simplification
A production SQL engine typically has careful three-valued logic rules for:
- `TRUE`
- `FALSE`
- `NULL`
The companion engine is simpler and more pedagogical. It uses Arrow and null-aware value handling, but it does not attempt the full depth of industrial SQL null semantics.
That is worth being explicit about.
The important lesson is:
- null semantics are one of the places where "small query engine" and "full SQL engine" diverge quickly
So any future engine work should treat null behavior as a first-class semantics question, not a cleanup item.
---
## Names vs positions
Expressions also sit at an important boundary between symbolic and positional access.
In the logical layer:
- columns are usually referenced by name
In the physical layer:
- columns are typically referenced by index
This shift matters because runtime execution wants fast positional access, while planning wants stable symbolic meaning.
That is why physical planning resolves `Column(name)` into `ColumnExpression(index)`.
This is one of the simplest examples of planning removing abstraction cost before execution begins.
---
## Data types as part of architecture
The companion engine's use of Arrow types reinforces a larger design lesson:
- the type system is part of architecture
It shapes:
- schema interchange
- source integration
- vector allocation
- expression evaluation
- aggregate implementation
That is why "choosing a type system" appears so early in the book. It affects much more than parser validation.
---
## Main takeaways
- Expressions are where much of a query engine's real semantics live.
- Logical expressions are about meaning, names, and output fields.
- Physical expressions are about batch-oriented evaluation and vector production.
- Types are needed both for validation and for concrete runtime behavior.
- Nullability affects ingestion, casting, filtering, aggregation, and output materialization.
- Aggregate expressions are fundamentally stateful and need accumulator machinery rather than plain scalar evaluation.
---
## Related notes
- `hqew/001-query-engine-glossary.md`
- `hqew/004-query-planning.md`
- `hqew/006-query-execution-models.md`
- `hqew/014-how-query-engines-work-part-1.md`
- `hqew/015-sql-front-end-and-logical-planning.md`
- `hqew/016-physical-plans-and-operators.md`
---
## Changelog
* **Apr 7, 2026** -- Added a dedicated note on expressions, Arrow-based types, coercion, aggregates, and null handling.