Add note files about query planning and optimization
This commit is contained in:
parent
584f82bb82
commit
2a33f8b483
147
hqew/004-query-planning.md
Normal file
147
hqew/004-query-planning.md
Normal file
@ -0,0 +1,147 @@
|
||||
# Query Planning
|
||||
|
||||
A reference for how a query request becomes an internal plan.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
Query planning is the stage where a query engine turns a user request into a structured representation of work to be done.
|
||||
|
||||
The main point is to separate:
|
||||
|
||||
- the syntax the user wrote
|
||||
- the meaning of the query
|
||||
- the later execution strategy
|
||||
|
||||
Without that separation, optimization and backend-independent execution become much harder.
|
||||
|
||||
---
|
||||
|
||||
## Typical pipeline
|
||||
|
||||
Planning usually sits between parsing and optimization:
|
||||
|
||||
1. parse query text or API calls
|
||||
2. build an AST or similar syntax tree
|
||||
3. resolve names and types
|
||||
4. produce a logical plan
|
||||
5. hand that plan to the optimizer
|
||||
|
||||
The exact boundaries differ across systems, but the general idea is stable.
|
||||
|
||||
---
|
||||
|
||||
## What planning does
|
||||
|
||||
### Parse structure into operations
|
||||
|
||||
The planner turns syntax such as `SELECT`, `WHERE`, `GROUP BY`, and `JOIN` into relational operators such as:
|
||||
|
||||
- scan
|
||||
- projection
|
||||
- filter
|
||||
- join
|
||||
- aggregate
|
||||
- limit
|
||||
|
||||
### Resolve names
|
||||
|
||||
The planner figures out what table or source a name refers to and which columns expressions mention.
|
||||
|
||||
### Check types
|
||||
|
||||
The planner verifies that expressions are valid, such as comparing compatible types or ensuring aggregates are used correctly.
|
||||
|
||||
### Build expressions
|
||||
|
||||
Predicates and computed columns are turned into internal expression trees.
|
||||
|
||||
### Attach schema information
|
||||
|
||||
The planner determines the shape of operator outputs so later stages know what columns and types flow through the plan.
|
||||
|
||||
---
|
||||
|
||||
## AST vs logical plan
|
||||
|
||||
This distinction matters.
|
||||
|
||||
- the AST reflects the query language syntax
|
||||
- the logical plan reflects the data operations implied by that syntax
|
||||
|
||||
For example, SQL syntax may contain clauses and aliases that are useful to the parser but irrelevant once the engine understands that the query means
|
||||
"scan, filter, then project."
|
||||
|
||||
So planning is partly a translation from language syntax into execution-oriented semantics.
|
||||
|
||||
---
|
||||
|
||||
## A tiny example
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT name
|
||||
FROM employees
|
||||
WHERE age > 18
|
||||
```
|
||||
|
||||
The parser may produce an AST containing nodes like:
|
||||
|
||||
- `SelectStatement`
|
||||
- `FromClause`
|
||||
- `WhereClause`
|
||||
|
||||
The planner turns that into a logical plan:
|
||||
|
||||
1. `Scan(employees)`
|
||||
2. `Filter(age > 18)`
|
||||
3. `Projection(name)`
|
||||
|
||||
That logical plan is what later stages optimize.
|
||||
|
||||
---
|
||||
|
||||
## Why planning matters
|
||||
|
||||
Planning is valuable because it creates the first stable representation of meaning inside the engine.
|
||||
|
||||
That gives the system a place to:
|
||||
|
||||
- validate the query
|
||||
- reason about schemas
|
||||
- rewrite plans
|
||||
- compare equivalent formulations
|
||||
- target different execution backends
|
||||
|
||||
In practice, planning is the bridge between the front-end language and the execution engine.
|
||||
|
||||
---
|
||||
|
||||
## Common complications
|
||||
|
||||
Planning gets harder when the query language includes:
|
||||
|
||||
- nested queries
|
||||
- correlated subqueries
|
||||
- user-defined functions
|
||||
- ambiguous names
|
||||
- multiple source types
|
||||
- non-relational operators
|
||||
|
||||
This is why planning is often a substantial subsystem, not just a parser post-processing step.
|
||||
|
||||
---
|
||||
|
||||
## Practical mental model
|
||||
|
||||
If parsing answers "what syntax did the user write?", planning answers "what data operations does that syntax mean?"
|
||||
|
||||
That is the cleanest way to think about it.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **Mar 31, 2026** -- First version created.
|
||||
158
hqew/005-query-optimization.md
Normal file
158
hqew/005-query-optimization.md
Normal file
@ -0,0 +1,158 @@
|
||||
# Query Optimization
|
||||
|
||||
A reference for how query engines make a plan cheaper without changing its meaning.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form.
|
||||
|
||||
The key word is equivalent: the result must stay the same even though the execution strategy changes.
|
||||
|
||||
---
|
||||
|
||||
## Why optimization exists
|
||||
|
||||
There are usually many ways to compute the same query.
|
||||
|
||||
For example, an engine may be able to:
|
||||
|
||||
- read all columns or only the needed ones
|
||||
- filter before or after another operator
|
||||
- join tables in different orders
|
||||
- pick different join algorithms
|
||||
|
||||
Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost.
|
||||
|
||||
---
|
||||
|
||||
## Common optimizations
|
||||
|
||||
### Projection pushdown
|
||||
|
||||
Read only the columns that are actually needed.
|
||||
|
||||
### Predicate pushdown
|
||||
|
||||
Apply filters as early as possible, ideally inside the data source.
|
||||
|
||||
### Constant folding
|
||||
|
||||
Precompute expressions such as `2 + 3` or simplify boolean expressions before execution.
|
||||
|
||||
### Expression simplification
|
||||
|
||||
Rewrite expressions into simpler equivalent forms.
|
||||
|
||||
### Join reordering
|
||||
|
||||
Change the order of joins to reduce intermediate result size.
|
||||
|
||||
### Limit pushdown
|
||||
|
||||
Push `LIMIT` closer to the source or to earlier stages when semantics allow it.
|
||||
|
||||
### Operator fusion
|
||||
|
||||
Combine adjacent operations to reduce overhead.
|
||||
|
||||
---
|
||||
|
||||
## Rule-based vs cost-based optimization
|
||||
|
||||
### Rule-based optimization
|
||||
|
||||
This applies fixed rewrite rules such as:
|
||||
|
||||
- push filters below projections
|
||||
- remove unused columns
|
||||
- simplify expressions
|
||||
|
||||
Strengths:
|
||||
|
||||
- simple
|
||||
- predictable
|
||||
- easy to implement incrementally
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- limited when multiple legal alternatives exist
|
||||
|
||||
### Cost-based optimization
|
||||
|
||||
This estimates the cost of alternative plans and chooses the best one according to some model.
|
||||
|
||||
It often depends on:
|
||||
|
||||
- table sizes
|
||||
- value distributions
|
||||
- selectivity estimates
|
||||
- available indexes
|
||||
|
||||
Strengths:
|
||||
|
||||
- can choose among many alternatives
|
||||
- important for complex join planning
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- depends on statistics quality
|
||||
- more implementation complexity
|
||||
|
||||
Most serious engines use both.
|
||||
|
||||
---
|
||||
|
||||
## Logical vs physical optimization
|
||||
|
||||
Optimization can happen at two levels.
|
||||
|
||||
### Logical optimization
|
||||
|
||||
Rewrite the plan while staying in logical-operator space.
|
||||
|
||||
Examples:
|
||||
|
||||
- pushdown rewrites
|
||||
- removing dead columns
|
||||
- simplifying expressions
|
||||
|
||||
### Physical optimization
|
||||
|
||||
Choose concrete execution strategies.
|
||||
|
||||
Examples:
|
||||
|
||||
- hash join vs sort-merge join
|
||||
- vectorized filter vs generic filter
|
||||
- index scan vs full scan
|
||||
|
||||
This distinction matters because some improvements are about semantics-preserving algebra, while others are about operator implementation choices.
|
||||
|
||||
---
|
||||
|
||||
## Why optimization is hard
|
||||
|
||||
Optimization is difficult because:
|
||||
|
||||
- the search space can explode
|
||||
- estimates are imperfect
|
||||
- the cheapest local rewrite is not always globally best
|
||||
- different workloads care about different costs
|
||||
|
||||
So optimizers are always making approximations, not proving the perfect plan.
|
||||
|
||||
---
|
||||
|
||||
## Practical mental model
|
||||
|
||||
If planning answers "what operations are needed?", optimization answers "what is the cheapest equivalent way to arrange and implement them?"
|
||||
|
||||
That is the essential idea.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **Mar 31, 2026** -- First version created.
|
||||
Loading…
x
Reference in New Issue
Block a user