habedi-work/useful-notes

Fork 0

Hassan Abedi 2a33f8b483 Add note files about query planning and optimization

2026-04-01 09:12:33 +02:00

3.2 KiB

Raw Blame History

Query Optimization

A reference for how query engines make a plan cheaper without changing its meaning.

Short answer

Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form.

The key word is equivalent: the result must stay the same even though the execution strategy changes.

Why optimization exists

There are usually many ways to compute the same query.

For example, an engine may be able to:

read all columns or only the needed ones
filter before or after another operator
join tables in different orders
pick different join algorithms

Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost.

Common optimizations

Projection pushdown

Read only the columns that are actually needed.

Predicate pushdown

Apply filters as early as possible, ideally inside the data source.

Constant folding

Precompute expressions such as 2 + 3 or simplify boolean expressions before execution.

Expression simplification

Rewrite expressions into simpler equivalent forms.

Join reordering

Change the order of joins to reduce intermediate result size.

Limit pushdown

Push LIMIT closer to the source or to earlier stages when semantics allow it.

Operator fusion

Combine adjacent operations to reduce overhead.

Rule-based vs cost-based optimization

Rule-based optimization

This applies fixed rewrite rules such as:

push filters below projections
remove unused columns
simplify expressions

Strengths:

simple
predictable
easy to implement incrementally

Weaknesses:

limited when multiple legal alternatives exist

Cost-based optimization

This estimates the cost of alternative plans and chooses the best one according to some model.

It often depends on:

table sizes
value distributions
selectivity estimates
available indexes

Strengths:

can choose among many alternatives
important for complex join planning

Weaknesses:

depends on statistics quality
more implementation complexity

Most serious engines use both.

Logical vs physical optimization

Optimization can happen at two levels.

Logical optimization

Rewrite the plan while staying in logical-operator space.

Examples:

pushdown rewrites
removing dead columns
simplifying expressions

Physical optimization

Choose concrete execution strategies.