diff --git a/hqew/016-physical-plans-and-operators.md b/hqew/016-physical-plans-and-operators.md new file mode 100644 index 0000000..b61c55e --- /dev/null +++ b/hqew/016-physical-plans-and-operators.md @@ -0,0 +1,378 @@ +# Physical Plans and Operators + +A reference for how a logical query plan becomes executable runtime work. + +--- + +## Short answer + +The physical plan is the runtime form of the query. + +Logical plans say: + +- what operations are required + +Physical plans say: + +- which executable operators will run +- what inputs they consume +- what batches they produce +- how logical expressions are turned into evaluable runtime expressions + +In the companion engine, the physical layer is intentionally direct: + +- each logical node maps to a concrete executable operator +- execution produces `Sequence` +- operators are mostly batch-oriented and columnar + +That makes the logical-to-physical boundary easy to study. + +--- + +## The physical planning pipeline + +Relevant modules: + +- `tmp/how-query-engines-work/query-planner/src/main/kotlin/QueryPlanner.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/PhysicalPlan.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ScanExec.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/SelectionExec.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/ProjectionExec.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashAggregateExec.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/HashJoinExec.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/LimitExec.kt` + +Conceptually: + +```text +logical plan + -> choose concrete executable operators + -> translate logical expressions into physical expressions + -> execute operators over record batches +``` + +--- + +## What a physical plan is + +The `PhysicalPlan` interface defines three core things: + +- `schema()`: the shape of produced output +- `execute()`: run the operator and emit `RecordBatch` values +- `children()`: the operator inputs + +This is a clean and important boundary. + +The physical layer is no longer about parsing or language semantics. It is about executable dataflow over batches. + +--- + +## The planner's job + +`QueryPlanner.createPhysicalPlan(plan)` recursively maps each logical operator to a runtime operator. + +The mappings are intentionally readable: + +- `Scan` -> `ScanExec` +- `Selection` -> `SelectionExec` +- `Projection` -> `ProjectionExec` +- `Aggregate` -> `HashAggregateExec` +- `Join` -> `HashJoinExec` +- `Limit` -> `LimitExec` + +At the same time, it translates logical expressions into runtime expressions. + +This means physical planning has two distinct jobs: + +1. choose operator implementations +2. lower logical expressions into executable expression evaluators + +That split is worth keeping explicit in any engine design. + +--- + +## Scan execution + +`ScanExec` is the leaf physical operator. + +Its job is simple: + +- expose the projected schema +- delegate scanning to the underlying `DataSource` +- rely on the source to produce batches + +This is also where projection pushdown becomes concrete. The operator receives a list of projected columns and passes that projection down to the source. + +So even though `ScanExec` is small, it carries important architectural meaning: + +- it is the storage boundary +- it is where source capabilities first affect runtime behavior +- it is how the physical plan begins consuming external data + +--- + +## Projection execution + +`ProjectionExec` runs a list of physical expressions over each input batch. + +For each incoming `RecordBatch`, it: + +1. evaluates each projection expression +2. collects the resulting output columns +3. builds a new `RecordBatch` with the projection schema + +This captures the batch-oriented style clearly: + +- expressions do not produce one value +- they produce a whole output column for the batch + +That is a core mental shift in vectorized engines. + +--- + +## Selection execution + +`SelectionExec` evaluates a predicate expression against a batch and expects the result to be a boolean Arrow `BitVector`. + +It then: + +1. counts selected rows +2. allocates filtered output vectors +3. copies only rows whose predicate bit is true +4. returns a filtered batch with the same schema + +This is a useful example of how vectorized filtering really works: + +- the predicate is first materialized as a boolean selection vector +- then each input column is filtered using that vector + +So selection is not "loop through rows and emit rows" at the conceptual boundary. It is "evaluate a batch predicate, then compact batch columns." + +--- + +## Hash aggregation + +`HashAggregateExec` is the aggregate runtime operator. + +It keeps a map from: + +- group key + +to: + +- a list of aggregate accumulators + +Its runtime shape is: + +1. evaluate grouping expressions on the batch +2. evaluate aggregate input expressions on the batch +3. iterate through rows +4. look up or create accumulator state for the row's grouping key +5. update accumulators +6. after all input is consumed, build one output batch from the accumulated state + +This is a blocking operator: + +- it must inspect all input before final results are ready + +That matters because the physical plan may be largely pipelined, but some operators introduce a barrier. + +--- + +## Partial and final aggregation + +The aggregate operator also supports different modes: + +- `COMPLETE` +- `PARTIAL` +- `FINAL` + +This is important because it shows how distributed execution reuses the same conceptual operator in multiple stages. + +The idea is: + +- `PARTIAL`: produce intermediate aggregate state locally +- `FINAL`: merge intermediate states into final answers + +Average is the clearest example. + +You cannot merge two averages by averaging the averages. You need both: + +- partial sum +- partial count + +The companion code models that explicitly through `AvgIntermediateState`. + +That is a good example of physical planning and execution needing richer runtime state than the SQL surface suggests. + +--- + +## Hash join + +`HashJoinExec` implements joins with a classic build/probe strategy. + +It: + +1. fully reads the right input +2. builds a hash table keyed by the right join columns +3. streams through the left input +4. probes the hash table for matches +5. constructs joined rows + +For left joins, it emits unmatched left rows with null-filled right columns. + +For right joins, it performs extra work after probing to emit unmatched right rows. + +Two practical details are worth noting. + +First, the planner resolves join column names to physical column indices ahead of execution. Runtime code wants positions, not symbolic names. + +Second, duplicate key columns from the right side may be excluded from the output schema when the join keys have the same name. That is a schema-shaping concern the planner and operator must agree on. + +--- + +## Limit execution + +`LimitExec` is simple but instructive. + +It streams input batches until the requested number of rows has been produced. If the limit falls in the middle of a batch, it truncates that batch and stops. + +This shows that even "tiny" operators still need batch-aware logic: + +- whole-batch pass-through when possible +- partial-batch copying when necessary + +So batch-oriented execution changes the shape of even very simple operators. + +--- + +## Expression lowering at the physical boundary + +The query planner also lowers logical expressions into physical expressions. + +Examples: + +- `Column(name)` -> `ColumnExpression(index)` +- `LiteralLong` -> `LiteralLongExpression` +- `Eq` -> `EqExpression` +- `Add` -> `AddExpression` +- `CastExpr` -> `CastExpression` + +One important detail is that aliases disappear at this stage. + +That is correct because aliases affect: + +- names +- schema +- planning + +They do not affect the actual runtime computation. + +This is a good example of semantics getting split cleanly across layers. + +--- + +## Operator tree and dataflow + +The physical plan is still a tree of operators, but now it is executable. + +A simple query such as: + +```sql +SELECT name +FROM employees +WHERE age > 18 +LIMIT 10 +``` + +typically becomes something like: + +1. `ScanExec` +2. `SelectionExec` +3. `ProjectionExec` +4. `LimitExec` + +Each operator consumes batches from its child and emits new batches upward. + +That is the operational heart of the engine. + +--- + +## Blocking vs streaming behavior + +The physical layer makes it obvious which operators can stream and which cannot. + +Mostly streaming: + +- scan +- projection +- selection +- limit + +Blocking or partially blocking: + +- hash aggregate +- hash join build phase +- right-join unmatched-row handling + +This distinction matters because runtime latency and memory behavior depend as much on operator blocking properties as on logical operator choice. + +--- + +## Strengths of the companion design + +The companion physical layer is especially useful as a teaching architecture because: + +- the logical-to-physical mapping is easy to see +- operators are concrete and local +- batches are the universal runtime unit +- runtime expressions are clearly separate from logical expressions +- distributed extensions can reuse the same concepts + +It is not trying to be the most optimized design possible. It is trying to make the moving parts visible. + +That is why it works well as a study target. + +--- + +## What is simplified relative to a production engine + +A production engine would likely add: + +- richer operator selection +- more cost-sensitive physical planning +- more join strategies +- more sophisticated memory management +- spill-to-disk behavior +- stronger predicate pushdown and source capabilities +- more operator fusion and code generation + +So this note should be read as a physical-plan primer, not as the full space of execution-engine design. + +--- + +## Main takeaways + +- Physical planning chooses executable operators and lowers expressions into runtime evaluators. +- Batch-oriented execution changes the shape of every operator, even simple ones like `LIMIT` and `FILTER`. +- Aggregation and joins expose the importance of blocking behavior, runtime state, and schema shaping. +- Planner work continues beyond logical planning: it also resolves indices, output schemas, and runtime operator contracts. +- The physical layer is where abstract query meaning finally turns into actual dataflow. + +--- + +## Related notes + +- `hqew/002-query-engine-primer.md` +- `hqew/003-query-engine-architectures.md` +- `hqew/006-query-execution-models.md` +- `hqew/008-joins-and-aggregations.md` +- `hqew/009-distributed-query-engines.md` +- `hqew/015-sql-front-end-and-logical-planning.md` +- `hqew/017-expressions-types-and-nullability.md` + +--- + +## Changelog + +* **Apr 7, 2026** -- Added a dedicated note on physical planning, executable operators, and batch-oriented runtime behavior. diff --git a/hqew/017-expressions-types-and-nullability.md b/hqew/017-expressions-types-and-nullability.md new file mode 100644 index 0000000..ff00ae9 --- /dev/null +++ b/hqew/017-expressions-types-and-nullability.md @@ -0,0 +1,384 @@ +# Expressions, Types, and Nullability + +A reference for how query engines represent computations over data and why types and nulls shape both planning and execution. + +--- + +## Short answer + +Expressions are the smallest units of query computation. + +Operators such as projection, filter, join, and aggregate do not do useful work by themselves. They rely on expressions to answer questions like: + +- which column is being referenced? +- what value does this literal represent? +- how are two values compared? +- how is a new value computed? +- what type does the result have? +- how should nulls be handled? + +In the companion engine, expressions exist in two layers: + +- logical expressions for planning +- physical expressions for execution + +Types and nullability matter in both layers. + +--- + +## Why expressions deserve their own note + +It is easy to talk about query engines only in terms of operators: + +- scan +- filter +- join +- aggregate + +But operators are really containers for expression evaluation. + +For example: + +- a filter is only meaningful because it evaluates a boolean predicate +- a projection is only meaningful because it evaluates output expressions +- an aggregate is only meaningful because it evaluates aggregate inputs and grouping keys + +So expressions are not side details. They are a large part of the engine's real semantics. + +--- + +## The two-layer model + +Relevant modules: + +- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/LogicalExpr.kt` +- `tmp/how-query-engines-work/logical-plan/src/main/kotlin/Expressions.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/Expressions.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/BinaryExpression.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/CastExpression.kt` +- `tmp/how-query-engines-work/physical-plan/src/main/kotlin/expressions/AggregateExpression.kt` +- `tmp/how-query-engines-work/datatypes/src/main/kotlin/Schema.kt` +- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ColumnVector.kt` +- `tmp/how-query-engines-work/datatypes/src/main/kotlin/RecordBatch.kt` +- `tmp/how-query-engines-work/datatypes/src/main/kotlin/ArrowTypes.kt` + +The key distinction is: + +- logical expressions answer planning questions +- physical expressions answer runtime evaluation questions + +Logical expressions expose: + +- names +- output types +- output fields +- planning-time structure + +Physical expressions expose: + +- `evaluate(input: RecordBatch): ColumnVector` + +That is a clean and important design split. + +--- + +## Logical expressions + +Logical expressions represent computation before execution details are fixed. + +Examples include: + +- `Column` +- `ColumnIndex` +- `LiteralString` +- `LiteralLong` +- `LiteralDouble` +- `LiteralDate` +- `LiteralIntervalDays` +- `CastExpr` +- `Eq`, `Gt`, `LtEq` +- `And`, `Or`, `Not` +- `Add`, `Subtract`, `Multiply`, `Divide` +- aggregate expressions such as `Sum`, `Avg`, `Count` +- aliasing via `Alias` + +The defining feature of a logical expression is `toField(input)`. + +That means the expression can answer: + +- what field name will this produce? +- what data type will it produce? + +So logical expressions are where planning gets its schema information. + +--- + +## Physical expressions + +Physical expressions are executable. + +They evaluate over an input `RecordBatch` and produce a `ColumnVector`. + +That means even a scalar-looking expression such as: + +```sql +age + 1 +``` + +does not produce one scalar at runtime. It produces a full output vector for the batch. + +Examples of physical expressions in the companion engine: + +- `ColumnExpression` +- literal expressions +- comparison expressions +- boolean expressions +- arithmetic expressions +- `CastExpression` +- date/interval expressions +- aggregate runtime expressions via accumulator-based machinery + +This is one of the clearest signs that the engine is batch-oriented rather than row-oriented. + +--- + +## The runtime currency: columns and batches + +Expression evaluation works because the runtime model is built around: + +- `Schema` +- `Field` +- `ColumnVector` +- `RecordBatch` + +`Schema` holds the list of fields. + +Each `Field` has: + +- a name +- an Arrow data type + +`ColumnVector` provides: + +- its Arrow type +- random access to values +- a length + +`RecordBatch` groups equal-length column vectors under one schema. + +So the actual runtime unit of expression evaluation is not "a row object." It is "a batch plus one or more vectors derived from it." + +--- + +## Why types matter in planning + +Types matter at planning time because the engine needs to know whether expressions make sense before it tries to run them. + +Examples: + +- a referenced column must exist in the input schema +- a cast target must be supported +- a comparison should have operands with compatible meaning +- an aggregate result has a particular output type + +The companion engine uses Arrow types as the type vocabulary. + +That is a strong design choice because it avoids inventing a parallel type universe disconnected from the runtime representation. + +So when planning computes a `Field`, it is already aligning: + +- expression semantics +- schema metadata +- runtime representation + +--- + +## Why types matter in execution + +At runtime, types affect: + +- which physical vectors get allocated +- how values are read and written +- what coercions are legal +- what aggregate logic is valid +- how nulls are represented + +For example: + +- `CastExpression` allocates an output vector of the requested Arrow type +- selection expects a boolean `BitVector` +- aggregate accumulators dispatch on numeric runtime types +- numeric binary expressions may coerce mismatched numeric inputs to `Double` + +So types are not paperwork. They determine concrete execution behavior. + +--- + +## Type coercion + +The companion engine has limited but explicit type coercion behavior. + +One notable example is `BinaryExpression`: + +- evaluate left vector +- evaluate right vector +- if their Arrow types differ +- try to coerce both to `Double` when both are numeric + +This is a pragmatic runtime decision. + +It shows two things clearly: + +- expression evaluation often needs type-reconciliation logic +- even a small engine needs rules for mixed-type arithmetic and comparison + +It also shows a current simplification: coercion is narrow and mostly numeric. A production engine would have a broader, more principled coercion matrix. + +--- + +## Casting + +`CastExpression` makes type conversion explicit. + +It supports a range of targets such as: + +- integer widths +- floating-point types +- string + +At runtime it: + +1. evaluates the source expression to a column vector +2. allocates a destination vector of the target type +3. converts each value +4. preserves nulls rather than coercing them into non-null sentinel values + +That last point matters. Nulls are not just a parsing concern. They must survive evaluation correctly through type conversion. + +--- + +## Aggregate expressions and accumulators + +Aggregates are special because their runtime behavior is stateful. + +The physical layer separates two concerns: + +- an `AggregateExpression` knows its input expression and how to create an accumulator +- an `Accumulator` maintains per-group state across many rows + +Examples: + +- `CountAccumulator` counts non-null inputs +- `SumAccumulator` adds values using type-dependent numeric logic +- `AvgAccumulator` tracks both sum and count + +This design is important because aggregate evaluation is not just "run an expression on a batch." It is "maintain state across many batch values." + +--- + +## Nullability in practice + +The notes in `hqew/001` and `hqew/014` already say nullability matters. The code shows why. + +Null handling appears in several concrete ways: + +- CSV readers call `setNull(...)` when cells are empty +- cast logic preserves nulls +- `COUNT` increments only for non-null values +- arithmetic and aggregate logic must decide what to do when values are null +- Arrow vectors use validity information to distinguish null from present values + +That means nullability is part of: + +- source ingestion +- expression semantics +- aggregate semantics +- output materialization + +It is not just an annotation on a schema. + +--- + +## Three-valued logic and current simplification + +A production SQL engine typically has careful three-valued logic rules for: + +- `TRUE` +- `FALSE` +- `NULL` + +The companion engine is simpler and more pedagogical. It uses Arrow and null-aware value handling, but it does not attempt the full depth of industrial SQL null semantics. + +That is worth being explicit about. + +The important lesson is: + +- null semantics are one of the places where "small query engine" and "full SQL engine" diverge quickly + +So any future engine work should treat null behavior as a first-class semantics question, not a cleanup item. + +--- + +## Names vs positions + +Expressions also sit at an important boundary between symbolic and positional access. + +In the logical layer: + +- columns are usually referenced by name + +In the physical layer: + +- columns are typically referenced by index + +This shift matters because runtime execution wants fast positional access, while planning wants stable symbolic meaning. + +That is why physical planning resolves `Column(name)` into `ColumnExpression(index)`. + +This is one of the simplest examples of planning removing abstraction cost before execution begins. + +--- + +## Data types as part of architecture + +The companion engine's use of Arrow types reinforces a larger design lesson: + +- the type system is part of architecture + +It shapes: + +- schema interchange +- source integration +- vector allocation +- expression evaluation +- aggregate implementation + +That is why "choosing a type system" appears so early in the book. It affects much more than parser validation. + +--- + +## Main takeaways + +- Expressions are where much of a query engine's real semantics live. +- Logical expressions are about meaning, names, and output fields. +- Physical expressions are about batch-oriented evaluation and vector production. +- Types are needed both for validation and for concrete runtime behavior. +- Nullability affects ingestion, casting, filtering, aggregation, and output materialization. +- Aggregate expressions are fundamentally stateful and need accumulator machinery rather than plain scalar evaluation. + +--- + +## Related notes + +- `hqew/001-query-engine-glossary.md` +- `hqew/004-query-planning.md` +- `hqew/006-query-execution-models.md` +- `hqew/014-how-query-engines-work-part-1.md` +- `hqew/015-sql-front-end-and-logical-planning.md` +- `hqew/016-physical-plans-and-operators.md` + +--- + +## Changelog + +* **Apr 7, 2026** -- Added a dedicated note on expressions, Arrow-based types, coercion, aggregates, and null handling.