useful-notes/hqew/004-query-planning.md

# Query Planning

A reference for how a query request becomes an internal plan.

---

## Short answer

Query planning is the stage where a query engine turns a user request into a structured representation of work to be done.

The main point is to separate:

- the syntax the user wrote
- the meaning of the query
- the later execution strategy

Without that separation, optimization and backend-independent execution become much harder.

---

## Typical pipeline

Planning usually sits between parsing and optimization:

1. parse query text or API calls
2. build an AST or similar syntax tree
3. resolve names and types
4. produce a logical plan
5. hand that plan to the optimizer

The exact boundaries differ across systems, but the general idea is stable.

---

## What planning does

### Parse structure into operations

The planner turns syntax such as `SELECT`, `WHERE`, `GROUP BY`, and `JOIN` into relational operators such as:

- scan
- projection
- filter
- join
- aggregate
- limit

### Resolve names

The planner figures out what table or source a name refers to and which columns expressions mention.

### Check types

The planner verifies that expressions are valid, such as comparing compatible types or ensuring aggregates are used correctly.

### Build expressions

Predicates and computed columns are turned into internal expression trees.

### Attach schema information

The planner determines the shape of operator outputs so later stages know what columns and types flow through the plan.

---

## AST vs logical plan

This distinction matters.

- the AST reflects the query language syntax
- the logical plan reflects the data operations implied by that syntax

For example, SQL syntax may contain clauses and aliases that are useful to the parser but irrelevant once the engine understands that the query means
"scan, filter, then project."

So planning is partly a translation from language syntax into execution-oriented semantics.

---

## A tiny example

Query:

```sql
SELECT name
FROM employees
WHERE age > 18
```

The parser may produce an AST containing nodes like:

- `SelectStatement`
- `FromClause`
- `WhereClause`

The planner turns that into a logical plan:

1. `Scan(employees)`
2. `Filter(age > 18)`
3. `Projection(name)`

That logical plan is what later stages optimize.

---

## Why planning matters

Planning is valuable because it creates the first stable representation of meaning inside the engine.

That gives the system a place to:

- validate the query
- reason about schemas
- rewrite plans
- compare equivalent formulations
- target different execution backends

In practice, planning is the bridge between the front-end language and the execution engine.

---

## Common complications

Planning gets harder when the query language includes:

- nested queries
- correlated subqueries
- user-defined functions
- ambiguous names
- multiple source types
- non-relational operators

This is why planning is often a substantial subsystem, not just a parser post-processing step.

---

## Practical mental model

If parsing answers "what syntax did the user write?", planning answers "what data operations does that syntax mean?"

That is the cleanest way to think about it.

---

## Changelog

* **Mar 31, 2026** -- First version created.