useful-notes/hqew/005-query-optimization.md

# Query Optimization

A reference for how query engines make a plan cheaper without changing its meaning.

---

## Short answer

Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form.

The key word is equivalent: the result must stay the same even though the execution strategy changes.

---

## Why optimization exists

There are usually many ways to compute the same query.

For example, an engine may be able to:

- read all columns or only the needed ones
- filter before or after another operator
- join tables in different orders
- pick different join algorithms

Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost.

---

## Common optimizations

### Projection pushdown

Read only the columns that are actually needed.

### Predicate pushdown

Apply filters as early as possible, ideally inside the data source.

### Constant folding

Precompute expressions such as `2 + 3` or simplify boolean expressions before execution.

### Expression simplification

Rewrite expressions into simpler equivalent forms.

### Join reordering

Change the order of joins to reduce intermediate result size.

### Limit pushdown

Push `LIMIT` closer to the source or to earlier stages when semantics allow it.

### Operator fusion

Combine adjacent operations to reduce overhead.

---

## Rule-based vs cost-based optimization

### Rule-based optimization

This applies fixed rewrite rules such as:

- push filters below projections
- remove unused columns
- simplify expressions

Strengths:

- simple
- predictable
- easy to implement incrementally

Weaknesses:

- limited when multiple legal alternatives exist

### Cost-based optimization

This estimates the cost of alternative plans and chooses the best one according to some model.

It often depends on:

- table sizes
- value distributions
- selectivity estimates
- available indexes

Strengths:

- can choose among many alternatives
- important for complex join planning

Weaknesses:

- depends on statistics quality
- more implementation complexity

Most serious engines use both.

---

## Logical vs physical optimization

Optimization can happen at two levels.

### Logical optimization

Rewrite the plan while staying in logical-operator space.

Examples:

- pushdown rewrites
- removing dead columns
- simplifying expressions

### Physical optimization

Choose concrete execution strategies.

Examples:

- hash join vs sort-merge join
- vectorized filter vs generic filter
- index scan vs full scan

This distinction matters because some improvements are about semantics-preserving algebra, while others are about operator implementation choices.

---

## Why optimization is hard

Optimization is difficult because:

- the search space can explode
- estimates are imperfect
- the cheapest local rewrite is not always globally best
- different workloads care about different costs

So optimizers are always making approximations, not proving the perfect plan.

---

## Practical mental model

If planning answers "what operations are needed?", optimization answers "what is the cheapest equivalent way to arrange and implement them?"

That is the essential idea.

---

## Changelog

* **Mar 31, 2026** -- First version created.