Add note files about query planning and optimization

2026-03-31 16:16:53 +02:00 · 2026-03-31 16:16:53 +02:00 · 2a33f8b483
commit 2a33f8b483
parent 584f82bb82
2 changed files with 305 additions and 0 deletions
--- a/hqew/004-query-planning.md
+++ b/hqew/004-query-planning.md
@ -0,0 +1,147 @@
+# Query Planning
+
+A reference for how a query request becomes an internal plan.
+
+---
+
+## Short answer
+
+Query planning is the stage where a query engine turns a user request into a structured representation of work to be done.
+
+The main point is to separate:
+
+- the syntax the user wrote
+- the meaning of the query
+- the later execution strategy
+
+Without that separation, optimization and backend-independent execution become much harder.
+
+---
+
+## Typical pipeline
+
+Planning usually sits between parsing and optimization:
+
+1. parse query text or API calls
+2. build an AST or similar syntax tree
+3. resolve names and types
+4. produce a logical plan
+5. hand that plan to the optimizer
+
+The exact boundaries differ across systems, but the general idea is stable.
+
+---
+
+## What planning does
+
+### Parse structure into operations
+
+The planner turns syntax such as `SELECT`, `WHERE`, `GROUP BY`, and `JOIN` into relational operators such as:
+
+- scan
+- projection
+- filter
+- join
+- aggregate
+- limit
+
+### Resolve names
+
+The planner figures out what table or source a name refers to and which columns expressions mention.
+
+### Check types
+
+The planner verifies that expressions are valid, such as comparing compatible types or ensuring aggregates are used correctly.
+
+### Build expressions
+
+Predicates and computed columns are turned into internal expression trees.
+
+### Attach schema information
+
+The planner determines the shape of operator outputs so later stages know what columns and types flow through the plan.
+
+---
+
+## AST vs logical plan
+
+This distinction matters.
+
+- the AST reflects the query language syntax
+- the logical plan reflects the data operations implied by that syntax
+
+For example, SQL syntax may contain clauses and aliases that are useful to the parser but irrelevant once the engine understands that the query means
+"scan, filter, then project."
+
+So planning is partly a translation from language syntax into execution-oriented semantics.
+
+---
+
+## A tiny example
+
+Query:
+
+```sql
+SELECT name
+FROM employees
+WHERE age > 18
+```
+
+The parser may produce an AST containing nodes like:
+
+- `SelectStatement`
+- `FromClause`
+- `WhereClause`
+
+The planner turns that into a logical plan:
+
+1. `Scan(employees)`
+2. `Filter(age > 18)`
+3. `Projection(name)`
+
+That logical plan is what later stages optimize.
+
+---
+
+## Why planning matters
+
+Planning is valuable because it creates the first stable representation of meaning inside the engine.
+
+That gives the system a place to:
+
+- validate the query
+- reason about schemas
+- rewrite plans
+- compare equivalent formulations
+- target different execution backends
+
+In practice, planning is the bridge between the front-end language and the execution engine.
+
+---
+
+## Common complications
+
+Planning gets harder when the query language includes:
+
+- nested queries
+- correlated subqueries
+- user-defined functions
+- ambiguous names
+- multiple source types
+- non-relational operators
+
+This is why planning is often a substantial subsystem, not just a parser post-processing step.
+
+---
+
+## Practical mental model
+
+If parsing answers "what syntax did the user write?", planning answers "what data operations does that syntax mean?"
+
+That is the cleanest way to think about it.
+
+---
+
+## Changelog
+
+* **Mar 31, 2026** -- First version created.
--- a/hqew/005-query-optimization.md
+++ b/hqew/005-query-optimization.md
@ -0,0 +1,158 @@
+# Query Optimization
+
+A reference for how query engines make a plan cheaper without changing its meaning.
+
+---
+
+## Short answer
+
+Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form.
+
+The key word is equivalent: the result must stay the same even though the execution strategy changes.
+
+---
+
+## Why optimization exists
+
+There are usually many ways to compute the same query.
+
+For example, an engine may be able to:
+
+- read all columns or only the needed ones
+- filter before or after another operator
+- join tables in different orders
+- pick different join algorithms
+
+Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost.
+
+---
+
+## Common optimizations
+
+### Projection pushdown
+
+Read only the columns that are actually needed.
+
+### Predicate pushdown
+
+Apply filters as early as possible, ideally inside the data source.
+
+### Constant folding
+
+Precompute expressions such as `2 + 3` or simplify boolean expressions before execution.
+
+### Expression simplification
+
+Rewrite expressions into simpler equivalent forms.
+
+### Join reordering
+
+Change the order of joins to reduce intermediate result size.
+
+### Limit pushdown
+
+Push `LIMIT` closer to the source or to earlier stages when semantics allow it.
+
+### Operator fusion
+
+Combine adjacent operations to reduce overhead.
+
+---
+
+## Rule-based vs cost-based optimization
+
+### Rule-based optimization
+
+This applies fixed rewrite rules such as:
+
+- push filters below projections
+- remove unused columns
+- simplify expressions
+
+Strengths:
+
+- simple
+- predictable
+- easy to implement incrementally
+
+Weaknesses:
+
+- limited when multiple legal alternatives exist
+
+### Cost-based optimization
+
+This estimates the cost of alternative plans and chooses the best one according to some model.
+
+It often depends on:
+
+- table sizes
+- value distributions
+- selectivity estimates
+- available indexes
+
+Strengths:
+
+- can choose among many alternatives
+- important for complex join planning
+
+Weaknesses:
+
+- depends on statistics quality
+- more implementation complexity
+
+Most serious engines use both.
+
+---
+
+## Logical vs physical optimization
+
+Optimization can happen at two levels.
+
+### Logical optimization
+
+Rewrite the plan while staying in logical-operator space.
+
+Examples:
+
+- pushdown rewrites
+- removing dead columns
+- simplifying expressions
+
+### Physical optimization
+
+Choose concrete execution strategies.
+
+Examples:
+
+- hash join vs sort-merge join
+- vectorized filter vs generic filter
+- index scan vs full scan
+
+This distinction matters because some improvements are about semantics-preserving algebra, while others are about operator implementation choices.
+
+---
+
+## Why optimization is hard
+
+Optimization is difficult because:
+
+- the search space can explode
+- estimates are imperfect
+- the cheapest local rewrite is not always globally best
+- different workloads care about different costs
+
+So optimizers are always making approximations, not proving the perfect plan.
+
+---
+
+## Practical mental model
+
+If planning answers "what operations are needed?", optimization answers "what is the cheapest equivalent way to arrange and implement them?"
+
+That is the essential idea.
+
+---
+
+## Changelog
+
+* **Mar 31, 2026** -- First version created.