# Query Optimization A reference for how query engines make a plan cheaper without changing its meaning. --- ## Short answer Query optimization is the process of rewriting a logical or physical plan into an equivalent but more efficient form. The key word is equivalent: the result must stay the same even though the execution strategy changes. --- ## Why optimization exists There are usually many ways to compute the same query. For example, an engine may be able to: - read all columns or only the needed ones - filter before or after another operator - join tables in different orders - pick different join algorithms Optimization tries to choose a cheaper plan in terms of CPU, memory, I/O, and network cost. --- ## Common optimizations ### Projection pushdown Read only the columns that are actually needed. ### Predicate pushdown Apply filters as early as possible, ideally inside the data source. ### Constant folding Precompute expressions such as `2 + 3` or simplify boolean expressions before execution. ### Expression simplification Rewrite expressions into simpler equivalent forms. ### Join reordering Change the order of joins to reduce intermediate result size. ### Limit pushdown Push `LIMIT` closer to the source or to earlier stages when semantics allow it. ### Operator fusion Combine adjacent operations to reduce overhead. --- ## Rule-based vs cost-based optimization ### Rule-based optimization This applies fixed rewrite rules such as: - push filters below projections - remove unused columns - simplify expressions Strengths: - simple - predictable - easy to implement incrementally Weaknesses: - limited when multiple legal alternatives exist ### Cost-based optimization This estimates the cost of alternative plans and chooses the best one according to some model. It often depends on: - table sizes - value distributions - selectivity estimates - available indexes Strengths: - can choose among many alternatives - important for complex join planning Weaknesses: - depends on statistics quality - more implementation complexity Most serious engines use both. --- ## Logical vs physical optimization Optimization can happen at two levels. ### Logical optimization Rewrite the plan while staying in logical-operator space. Examples: - pushdown rewrites - removing dead columns - simplifying expressions ### Physical optimization Choose concrete execution strategies. Examples: - hash join vs sort-merge join - vectorized filter vs generic filter - index scan vs full scan This distinction matters because some improvements are about semantics-preserving algebra, while others are about operator implementation choices. --- ## Why optimization is hard Optimization is difficult because: - the search space can explode - estimates are imperfect - the cheapest local rewrite is not always globally best - different workloads care about different costs So optimizers are always making approximations, not proving the perfect plan. --- ## Practical mental model If planning answers "what operations are needed?", optimization answers "what is the cheapest equivalent way to arrange and implement them?" That is the essential idea. --- ## Changelog * **Mar 31, 2026** -- First version created.