diff --git a/hqew/004-query-planning.md b/hqew/004-query-planning.md new file mode 100644 index 0000000..ff00a38 --- /dev/null +++ b/hqew/004-query-planning.md @@ -0,0 +1,147 @@ +# Query Planning + +A reference for how a query request becomes an internal plan. + +--- + +## Short answer + +Query planning is the stage where a query engine turns a user request into a structured representation of work to be done. + +The main point is to separate: + +- the syntax the user wrote +- the meaning of the query +- the later execution strategy + +Without that separation, optimization and backend-independent execution become much harder. + +--- + +## Typical pipeline + +Planning usually sits between parsing and optimization: + +1. parse query text or API calls +2. build an AST or similar syntax tree +3. resolve names and types +4. produce a logical plan +5. hand that plan to the optimizer + +The exact boundaries differ across systems, but the general idea is stable. + +--- + +## What planning does + +### Parse structure into operations + +The planner turns syntax such as `SELECT`, `WHERE`, `GROUP BY`, and `JOIN` into relational operators such as: + +- scan +- projection +- filter +- join +- aggregate +- limit + +### Resolve names + +The planner figures out what table or source a name refers to and which columns expressions mention. + +### Check types + +The planner verifies that expressions are valid, such as comparing compatible types or ensuring aggregates are used correctly. + +### Build expressions + +Predicates and computed columns are turned into internal expression trees. + +### Attach schema information + +The planner determines the shape of operator outputs so later stages know what columns and types flow through the plan. + +--- + +## AST vs logical plan + +This distinction matters. + +- the AST reflects the query language syntax +- the logical plan reflects the data operations implied by that syntax + +For example, SQL syntax may contain clauses and aliases that are useful to the parser but irrelevant once the engine understands that the query means +"scan, filter, then project." + +So planning is partly a translation from language syntax into execution-oriented semantics. + +--- + +## A tiny example + +Query: + +```sql +SELECT name +FROM employees +WHERE age > 18 +``` + +The parser may produce an AST containing nodes like: + +- `SelectStatement` +- `FromClause` +- `WhereClause` + +The planner turns that into a logical plan: + +1. `Scan(employees)` +2. `Filter(age > 18)` +3. `Projection(name)` + +That logical plan is what later stages optimize. + +--- + +## Why planning matters + +Planning is valuable because it creates the first stable representation of meaning inside the engine. + +That gives the system a place to: + +- validate the query +- reason about schemas +- rewrite plans +- compare equivalent formulations +- target different execution backends + +In practice, planning is the bridge between the front-end language and the execution engine. + +--- + +## Common complications + +Planning gets harder when the query language includes: + +- nested queries +- correlated subqueries +- user-defined functions +- ambiguous names +- multiple source types +- non-relational operators + +This is why planning is often a substantial subsystem, not just a parser post-processing step. + +--- + +## Practical mental model + +If parsing answers "what syntax did the user write?", planning answers "what data operations does that syntax mean?" + +That is the cleanest way to think about it. + +--- + +## Changelog + +* **Mar 31, 2026** -- First version created. diff --git a/hqew/005-query-optimization.md b/hqew/005-query-optimization.md new file mode 100644 index 0000000..3301705 --- /dev/null +++ b/hqew/005-query-optimization.md @@ -0,0 +1,158 @@ +# Query Optimization + +A reference for how query engines make a plan cheaper without changing its meaning. + +--- + +## Short answer + +Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form. + +The key word is equivalent: the result must stay the same even though the execution strategy changes. + +--- + +## Why optimization exists + +There are usually many ways to compute the same query. + +For example, an engine may be able to: + +- read all columns or only the needed ones +- filter before or after another operator +- join tables in different orders +- pick different join algorithms + +Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost. + +--- + +## Common optimizations + +### Projection pushdown + +Read only the columns that are actually needed. + +### Predicate pushdown + +Apply filters as early as possible, ideally inside the data source. + +### Constant folding + +Precompute expressions such as `2 + 3` or simplify boolean expressions before execution. + +### Expression simplification + +Rewrite expressions into simpler equivalent forms. + +### Join reordering + +Change the order of joins to reduce intermediate result size. + +### Limit pushdown + +Push `LIMIT` closer to the source or to earlier stages when semantics allow it. + +### Operator fusion + +Combine adjacent operations to reduce overhead. + +--- + +## Rule-based vs cost-based optimization + +### Rule-based optimization + +This applies fixed rewrite rules such as: + +- push filters below projections +- remove unused columns +- simplify expressions + +Strengths: + +- simple +- predictable +- easy to implement incrementally + +Weaknesses: + +- limited when multiple legal alternatives exist + +### Cost-based optimization + +This estimates the cost of alternative plans and chooses the best one according to some model. + +It often depends on: + +- table sizes +- value distributions +- selectivity estimates +- available indexes + +Strengths: + +- can choose among many alternatives +- important for complex join planning + +Weaknesses: + +- depends on statistics quality +- more implementation complexity + +Most serious engines use both. + +--- + +## Logical vs physical optimization + +Optimization can happen at two levels. + +### Logical optimization + +Rewrite the plan while staying in logical-operator space. + +Examples: + +- pushdown rewrites +- removing dead columns +- simplifying expressions + +### Physical optimization + +Choose concrete execution strategies. + +Examples: + +- hash join vs sort-merge join +- vectorized filter vs generic filter +- index scan vs full scan + +This distinction matters because some improvements are about semantics-preserving algebra, while others are about operator implementation choices. + +--- + +## Why optimization is hard + +Optimization is difficult because: + +- the search space can explode +- estimates are imperfect +- the cheapest local rewrite is not always globally best +- different workloads care about different costs + +So optimizers are always making approximations, not proving the perfect plan. + +--- + +## Practical mental model + +If planning answers "what operations are needed?", optimization answers "what is the cheapest equivalent way to arrange and implement them?" + +That is the essential idea. + +--- + +## Changelog + +* **Mar 31, 2026** -- First version created.