Add notes files on physical query planning and evaluation

2026-04-07 14:40:37 +02:00 · 2026-04-07 14:40:37 +02:00 · b2aec58a85
commit b2aec58a85
parent 13d4f9e638
2 changed files with 762 additions and 0 deletions
--- a/hqew/016-physical-plans-and-operators.md
+++ b/hqew/016-physical-plans-and-operators.md
@ -0,0 +1,378 @@
+# Physical Plans and Operators
+
+A reference for how a logical query plan becomes executable runtime work.
+
+---
+
+## Short answer
+
+The physical plan is the runtime form of the query.
+
+Logical plans say:
+
+- what operations are required
+
+Physical plans say:
+
+- which executable operators will run
+- what inputs they consume
+- what batches they produce
+- how logical expressions are turned into evaluable runtime expressions
+
+In the companion engine, the physical layer is intentionally direct:
+
+- each logical node maps to a concrete executable operator
+- execution produces `Sequence<RecordBatch>`
+- operators are mostly batch-oriented and columnar
+
+That makes the logical-to-physical boundary easy to study.
+
+---
+
+## The physical planning pipeline
+
+Relevant modules:
+
+- `tmp/how-query-engines-work/query-planner/src/main/kotlin/QueryPlanner.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/PhysicalPlan.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ScanExec.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/SelectionExec.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ProjectionExec.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashAggregateExec.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashJoinExec.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/LimitExec.kt`
+
+Conceptually:
+
+```text
+logical plan
+  -> choose concrete executable operators
+  -> translate logical expressions into physical expressions
+  -> execute operators over record batches
+```
+
+---
+
+## What a physical plan is
+
+The `PhysicalPlan` interface defines three core things:
+
+- `schema()`: the shape of produced output
+- `execute()`: run the operator and emit `RecordBatch` values
+- `children()`: the operator inputs
+
+This is a clean and important boundary.
+
+The physical layer is no longer about parsing or language semantics. It is about executable dataflow over batches.
+
+---
+
+## The planner's job
+
+`QueryPlanner.createPhysicalPlan(plan)` recursively maps each logical operator to a runtime operator.
+
+The mappings are intentionally readable:
+
+- `Scan` -> `ScanExec`
+- `Selection` -> `SelectionExec`
+- `Projection` -> `ProjectionExec`
+- `Aggregate` -> `HashAggregateExec`
+- `Join` -> `HashJoinExec`
+- `Limit` -> `LimitExec`
+
+At the same time, it translates logical expressions into runtime expressions.
+
+This means physical planning has two distinct jobs:
+
+1. choose operator implementations
+2. lower logical expressions into executable expression evaluators
+
+That split is worth keeping explicit in any engine design.
+
+---
+
+## Scan execution
+
+`ScanExec` is the leaf physical operator.
+
+Its job is simple:
+
+- expose the projected schema
+- delegate scanning to the underlying `DataSource`
+- rely on the source to produce batches
+
+This is also where projection pushdown becomes concrete. The operator receives a list of projected columns and passes that projection down to the source.
+
+So even though `ScanExec` is small, it carries important architectural meaning:
+
+- it is the storage boundary
+- it is where source capabilities first affect runtime behavior
+- it is how the physical plan begins consuming external data
+
+---
+
+## Projection execution
+
+`ProjectionExec` runs a list of physical expressions over each input batch.
+
+For each incoming `RecordBatch`, it:
+
+1. evaluates each projection expression
+2. collects the resulting output columns
+3. builds a new `RecordBatch` with the projection schema
+
+This captures the batch-oriented style clearly:
+
+- expressions do not produce one value
+- they produce a whole output column for the batch
+
+That is a core mental shift in vectorized engines.
+
+---
+
+## Selection execution
+
+`SelectionExec` evaluates a predicate expression against a batch and expects the result to be a boolean Arrow `BitVector`.
+
+It then:
+
+1. counts selected rows
+2. allocates filtered output vectors
+3. copies only rows whose predicate bit is true
+4. returns a filtered batch with the same schema
+
+This is a useful example of how vectorized filtering really works:
+
+- the predicate is first materialized as a boolean selection vector
+- then each input column is filtered using that vector
+
+So selection is not "loop through rows and emit rows" at the conceptual boundary. It is "evaluate a batch predicate, then compact batch columns."
+
+---
+
+## Hash aggregation
+
+`HashAggregateExec` is the aggregate runtime operator.
+
+It keeps a map from:
+
+- group key
+
+to:
+
+- a list of aggregate accumulators
+
+Its runtime shape is:
+
+1. evaluate grouping expressions on the batch
+2. evaluate aggregate input expressions on the batch
+3. iterate through rows
+4. look up or create accumulator state for the row's grouping key
+5. update accumulators
+6. after all input is consumed, build one output batch from the accumulated state
+
+This is a blocking operator:
+
+- it must inspect all input before final results are ready
+
+That matters because the physical plan may be largely pipelined, but some operators introduce a barrier.
+
+---
+
+## Partial and final aggregation
+
+The aggregate operator also supports different modes:
+
+- `COMPLETE`
+- `PARTIAL`
+- `FINAL`
+
+This is important because it shows how distributed execution reuses the same conceptual operator in multiple stages.
+
+The idea is:
+
+- `PARTIAL`: produce intermediate aggregate state locally
+- `FINAL`: merge intermediate states into final answers
+
+Average is the clearest example.
+
+You cannot merge two averages by averaging the averages. You need both:
+
+- partial sum
+- partial count
+
+The companion code models that explicitly through `AvgIntermediateState`.
+
+That is a good example of physical planning and execution needing richer runtime state than the SQL surface suggests.
+
+---
+
+## Hash join
+
+`HashJoinExec` implements joins with a classic build/probe strategy.
+
+It:
+
+1. fully reads the right input
+2. builds a hash table keyed by the right join columns
+3. streams through the left input
+4. probes the hash table for matches
+5. constructs joined rows
+
+For left joins, it emits unmatched left rows with null-filled right columns.
+
+For right joins, it performs extra work after probing to emit unmatched right rows.
+
+Two practical details are worth noting.
+
+First, the planner resolves join column names to physical column indices ahead of execution. Runtime code wants positions, not symbolic names.
+
+Second, duplicate key columns from the right side may be excluded from the output schema when the join keys have the same name. That is a schema-shaping concern the planner and operator must agree on.
+
+---
+
+## Limit execution
+
+`LimitExec` is simple but instructive.
+
+It streams input batches until the requested number of rows has been produced. If the limit falls in the middle of a batch, it truncates that batch and stops.
+
+This shows that even "tiny" operators still need batch-aware logic:
+
+- whole-batch pass-through when possible
+- partial-batch copying when necessary
+
+So batch-oriented execution changes the shape of even very simple operators.
+
+---
+
+## Expression lowering at the physical boundary
+
+The query planner also lowers logical expressions into physical expressions.
+
+Examples:
+
+- `Column(name)` -> `ColumnExpression(index)`
+- `LiteralLong` -> `LiteralLongExpression`
+- `Eq` -> `EqExpression`
+- `Add` -> `AddExpression`
+- `CastExpr` -> `CastExpression`
+
+One important detail is that aliases disappear at this stage.
+
+That is correct because aliases affect:
+
+- names
+- schema
+- planning
+
+They do not affect the actual runtime computation.
+
+This is a good example of semantics getting split cleanly across layers.
+
+---
+
+## Operator tree and dataflow
+
+The physical plan is still a tree of operators, but now it is executable.
+
+A simple query such as:
+
+```sql
+SELECT name
+FROM employees
+WHERE age > 18
+LIMIT 10
+```
+
+typically becomes something like:
+
+1. `ScanExec`
+2. `SelectionExec`
+3. `ProjectionExec`
+4. `LimitExec`
+
+Each operator consumes batches from its child and emits new batches upward.
+
+That is the operational heart of the engine.
+
+---
+
+## Blocking vs streaming behavior
+
+The physical layer makes it obvious which operators can stream and which cannot.
+
+Mostly streaming:
+
+- scan
+- projection
+- selection
+- limit
+
+Blocking or partially blocking:
+
+- hash aggregate
+- hash join build phase
+- right-join unmatched-row handling
+
+This distinction matters because runtime latency and memory behavior depend as much on operator blocking properties as on logical operator choice.
+
+---
+
+## Strengths of the companion design
+
+The companion physical layer is especially useful as a teaching architecture because:
+
+- the logical-to-physical mapping is easy to see
+- operators are concrete and local
+- batches are the universal runtime unit
+- runtime expressions are clearly separate from logical expressions
+- distributed extensions can reuse the same concepts
+
+It is not trying to be the most optimized design possible. It is trying to make the moving parts visible.
+
+That is why it works well as a study target.
+
+---
+
+## What is simplified relative to a production engine
+
+A production engine would likely add:
+
+- richer operator selection
+- more cost-sensitive physical planning
+- more join strategies
+- more sophisticated memory management
+- spill-to-disk behavior
+- stronger predicate pushdown and source capabilities
+- more operator fusion and code generation
+
+So this note should be read as a physical-plan primer, not as the full space of execution-engine design.
+
+---
+
+## Main takeaways
+
+- Physical planning chooses executable operators and lowers expressions into runtime evaluators.
+- Batch-oriented execution changes the shape of every operator, even simple ones like `LIMIT` and `FILTER`.
+- Aggregation and joins expose the importance of blocking behavior, runtime state, and schema shaping.
+- Planner work continues beyond logical planning: it also resolves indices, output schemas, and runtime operator contracts.
+- The physical layer is where abstract query meaning finally turns into actual dataflow.
+
+---
+
+## Related notes
+
+- `hqew/002-query-engine-primer.md`
+- `hqew/003-query-engine-architectures.md`
+- `hqew/006-query-execution-models.md`
+- `hqew/008-joins-and-aggregations.md`
+- `hqew/009-distributed-query-engines.md`
+- `hqew/015-sql-front-end-and-logical-planning.md`
+- `hqew/017-expressions-types-and-nullability.md`
+
+---
+
+## Changelog
+
+* **Apr 7, 2026** -- Added a dedicated note on physical planning, executable operators, and batch-oriented runtime behavior.
--- a/hqew/017-expressions-types-and-nullability.md
+++ b/hqew/017-expressions-types-and-nullability.md
@ -0,0 +1,384 @@
+# Expressions, Types, and Nullability
+
+A reference for how query engines represent computations over data and why types and nulls shape both planning and execution.
+
+---
+
+## Short answer
+
+Expressions are the smallest units of query computation.
+
+Operators such as projection, filter, join, and aggregate do not do useful work by themselves. They rely on expressions to answer questions like:
+
+- which column is being referenced?
+- what value does this literal represent?
+- how are two values compared?
+- how is a new value computed?
+- what type does the result have?
+- how should nulls be handled?
+
+In the companion engine, expressions exist in two layers:
+
+- logical expressions for planning
+- physical expressions for execution
+
+Types and nullability matter in both layers.
+
+---
+
+## Why expressions deserve their own note
+
+It is easy to talk about query engines only in terms of operators:
+
+- scan
+- filter
+- join
+- aggregate
+
+But operators are really containers for expression evaluation.
+
+For example:
+
+- a filter is only meaningful because it evaluates a boolean predicate
+- a projection is only meaningful because it evaluates output expressions
+- an aggregate is only meaningful because it evaluates aggregate inputs and grouping keys
+
+So expressions are not side details. They are a large part of the engine's real semantics.
+
+---
+
+## The two-layer model
+
+Relevant modules:
+
+- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/LogicalExpr.kt`
+- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/Expressions.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/Expressions.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/BinaryExpression.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/CastExpression.kt`
+- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/AggregateExpression.kt`
+- `tmp/how-query-engines-work/datatypes/src/main/kotlin/Schema.kt`
+- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ColumnVector.kt`
+- `tmp/how-query-engines-work/datatypes/src/main/kotlin/RecordBatch.kt`
+- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ArrowTypes.kt`
+
+The key distinction is:
+
+- logical expressions answer planning questions
+- physical expressions answer runtime evaluation questions
+
+Logical expressions expose:
+
+- names
+- output types
+- output fields
+- planning-time structure
+
+Physical expressions expose:
+
+- `evaluate(input: RecordBatch): ColumnVector`
+
+That is a clean and important design split.
+
+---
+
+## Logical expressions
+
+Logical expressions represent computation before execution details are fixed.
+
+Examples include:
+
+- `Column`
+- `ColumnIndex`
+- `LiteralString`
+- `LiteralLong`
+- `LiteralDouble`
+- `LiteralDate`
+- `LiteralIntervalDays`
+- `CastExpr`
+- `Eq`, `Gt`, `LtEq`
+- `And`, `Or`, `Not`
+- `Add`, `Subtract`, `Multiply`, `Divide`
+- aggregate expressions such as `Sum`, `Avg`, `Count`
+- aliasing via `Alias`
+
+The defining feature of a logical expression is `toField(input)`.
+
+That means the expression can answer:
+
+- what field name will this produce?
+- what data type will it produce?
+
+So logical expressions are where planning gets its schema information.
+
+---
+
+## Physical expressions
+
+Physical expressions are executable.
+
+They evaluate over an input `RecordBatch` and produce a `ColumnVector`.
+
+That means even a scalar-looking expression such as:
+
+```sql
+age + 1
+```
+
+does not produce one scalar at runtime. It produces a full output vector for the batch.
+
+Examples of physical expressions in the companion engine:
+
+- `ColumnExpression`
+- literal expressions
+- comparison expressions
+- boolean expressions
+- arithmetic expressions
+- `CastExpression`
+- date/interval expressions
+- aggregate runtime expressions via accumulator-based machinery
+
+This is one of the clearest signs that the engine is batch-oriented rather than row-oriented.
+
+---
+
+## The runtime currency: columns and batches
+
+Expression evaluation works because the runtime model is built around:
+
+- `Schema`
+- `Field`
+- `ColumnVector`
+- `RecordBatch`
+
+`Schema` holds the list of fields.
+
+Each `Field` has:
+
+- a name
+- an Arrow data type
+
+`ColumnVector` provides:
+
+- its Arrow type
+- random access to values
+- a length
+
+`RecordBatch` groups equal-length column vectors under one schema.
+
+So the actual runtime unit of expression evaluation is not "a row object." It is "a batch plus one or more vectors derived from it."
+
+---
+
+## Why types matter in planning
+
+Types matter at planning time because the engine needs to know whether expressions make sense before it tries to run them.
+
+Examples:
+
+- a referenced column must exist in the input schema
+- a cast target must be supported
+- a comparison should have operands with compatible meaning
+- an aggregate result has a particular output type
+
+The companion engine uses Arrow types as the type vocabulary.
+
+That is a strong design choice because it avoids inventing a parallel type universe disconnected from the runtime representation.
+
+So when planning computes a `Field`, it is already aligning:
+
+- expression semantics
+- schema metadata
+- runtime representation
+
+---
+
+## Why types matter in execution
+
+At runtime, types affect:
+
+- which physical vectors get allocated
+- how values are read and written
+- what coercions are legal
+- what aggregate logic is valid
+- how nulls are represented
+
+For example:
+
+- `CastExpression` allocates an output vector of the requested Arrow type
+- selection expects a boolean `BitVector`
+- aggregate accumulators dispatch on numeric runtime types
+- numeric binary expressions may coerce mismatched numeric inputs to `Double`
+
+So types are not paperwork. They determine concrete execution behavior.
+
+---
+
+## Type coercion
+
+The companion engine has limited but explicit type coercion behavior.
+
+One notable example is `BinaryExpression`:
+
+- evaluate left vector
+- evaluate right vector
+- if their Arrow types differ
+- try to coerce both to `Double` when both are numeric
+
+This is a pragmatic runtime decision.
+
+It shows two things clearly:
+
+- expression evaluation often needs type-reconciliation logic
+- even a small engine needs rules for mixed-type arithmetic and comparison
+
+It also shows a current simplification: coercion is narrow and mostly numeric. A production engine would have a broader, more principled coercion matrix.
+
+---
+
+## Casting
+
+`CastExpression` makes type conversion explicit.
+
+It supports a range of targets such as:
+
+- integer widths
+- floating-point types
+- string
+
+At runtime it:
+
+1. evaluates the source expression to a column vector
+2. allocates a destination vector of the target type
+3. converts each value
+4. preserves nulls rather than coercing them into non-null sentinel values
+
+That last point matters. Nulls are not just a parsing concern. They must survive evaluation correctly through type conversion.
+
+---
+
+## Aggregate expressions and accumulators
+
+Aggregates are special because their runtime behavior is stateful.
+
+The physical layer separates two concerns:
+
+- an `AggregateExpression` knows its input expression and how to create an accumulator
+- an `Accumulator` maintains per-group state across many rows
+
+Examples:
+
+- `CountAccumulator` counts non-null inputs
+- `SumAccumulator` adds values using type-dependent numeric logic
+- `AvgAccumulator` tracks both sum and count
+
+This design is important because aggregate evaluation is not just "run an expression on a batch." It is "maintain state across many batch values."
+
+---
+
+## Nullability in practice
+
+The notes in `hqew/001` and `hqew/014` already say nullability matters. The code shows why.
+
+Null handling appears in several concrete ways:
+
+- CSV readers call `setNull(...)` when cells are empty
+- cast logic preserves nulls
+- `COUNT` increments only for non-null values
+- arithmetic and aggregate logic must decide what to do when values are null
+- Arrow vectors use validity information to distinguish null from present values
+
+That means nullability is part of:
+
+- source ingestion
+- expression semantics
+- aggregate semantics
+- output materialization
+
+It is not just an annotation on a schema.
+
+---
+
+## Three-valued logic and current simplification
+
+A production SQL engine typically has careful three-valued logic rules for:
+
+- `TRUE`
+- `FALSE`
+- `NULL`
+
+The companion engine is simpler and more pedagogical. It uses Arrow and null-aware value handling, but it does not attempt the full depth of industrial SQL null semantics.
+
+That is worth being explicit about.
+
+The important lesson is:
+
+- null semantics are one of the places where "small query engine" and "full SQL engine" diverge quickly
+
+So any future engine work should treat null behavior as a first-class semantics question, not a cleanup item.
+
+---
+
+## Names vs positions
+
+Expressions also sit at an important boundary between symbolic and positional access.
+
+In the logical layer:
+
+- columns are usually referenced by name
+
+In the physical layer:
+
+- columns are typically referenced by index
+
+This shift matters because runtime execution wants fast positional access, while planning wants stable symbolic meaning.
+
+That is why physical planning resolves `Column(name)` into `ColumnExpression(index)`.
+
+This is one of the simplest examples of planning removing abstraction cost before execution begins.
+
+---
+
+## Data types as part of architecture
+
+The companion engine's use of Arrow types reinforces a larger design lesson:
+
+- the type system is part of architecture
+
+It shapes:
+
+- schema interchange
+- source integration
+- vector allocation
+- expression evaluation
+- aggregate implementation
+
+That is why "choosing a type system" appears so early in the book. It affects much more than parser validation.
+
+---
+
+## Main takeaways
+
+- Expressions are where much of a query engine's real semantics live.
+- Logical expressions are about meaning, names, and output fields.
+- Physical expressions are about batch-oriented evaluation and vector production.
+- Types are needed both for validation and for concrete runtime behavior.
+- Nullability affects ingestion, casting, filtering, aggregation, and output materialization.
+- Aggregate expressions are fundamentally stateful and need accumulator machinery rather than plain scalar evaluation.
+
+---
+
+## Related notes
+
+- `hqew/001-query-engine-glossary.md`
+- `hqew/004-query-planning.md`
+- `hqew/006-query-execution-models.md`
+- `hqew/014-how-query-engines-work-part-1.md`
+- `hqew/015-sql-front-end-and-logical-planning.md`
+- `hqew/016-physical-plans-and-operators.md`
+
+---
+
+## Changelog
+
+* **Apr 7, 2026** -- Added a dedicated note on expressions, Arrow-based types, coercion, aggregates, and null handling.