Add note files about query execution models and indexes
This commit is contained in:
parent
2a33f8b483
commit
8ed8347380
167
hqew/006-query-execution-models.md
Normal file
167
hqew/006-query-execution-models.md
Normal file
@ -0,0 +1,167 @@
|
||||
# Query Execution Models
|
||||
|
||||
A reference for the main ways query operators run at runtime.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
An execution model defines how operators consume input, produce output, and pass data through a plan.
|
||||
|
||||
The most important questions are:
|
||||
|
||||
- one row at a time or many values at once?
|
||||
- pull-based or push-based?
|
||||
- pipelined or materialized?
|
||||
|
||||
Those choices strongly affect latency, CPU efficiency, and implementation complexity.
|
||||
|
||||
---
|
||||
|
||||
## Row-at-a-time execution
|
||||
|
||||
In a row-oriented model, operators process one tuple at a time.
|
||||
|
||||
This is often implemented with an iterator interface where a parent asks a child for the next row.
|
||||
|
||||
Strengths:
|
||||
|
||||
- simple
|
||||
- modular
|
||||
- easy to debug
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- high per-row overhead
|
||||
- worse cache behavior for analytics
|
||||
|
||||
This model is historically important and still useful in many systems.
|
||||
|
||||
---
|
||||
|
||||
## Batch-oriented execution
|
||||
|
||||
In a batch model, operators process chunks of rows together.
|
||||
|
||||
The batch may be row-based or columnar, but the main idea is to amortize operator overhead across many values.
|
||||
|
||||
Strengths:
|
||||
|
||||
- better CPU efficiency
|
||||
- lower dispatch overhead
|
||||
- easier parallelism inside an operator
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- more bookkeeping
|
||||
- more complex control flow
|
||||
|
||||
---
|
||||
|
||||
## Vectorized execution
|
||||
|
||||
Vectorized execution is a batch-oriented style where operators often process column vectors rather than full row objects.
|
||||
|
||||
This fits well with columnar memory layouts and analytical workloads.
|
||||
|
||||
Strengths:
|
||||
|
||||
- excellent cache locality
|
||||
- better SIMD opportunities
|
||||
- good fit for scans, filters, joins, and aggregates
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- some control-flow-heavy logic is less natural
|
||||
- more careful null and type handling is needed
|
||||
|
||||
---
|
||||
|
||||
## Pull vs push
|
||||
|
||||
### Pull-based execution
|
||||
|
||||
Parent operators ask children for data.
|
||||
|
||||
Strengths:
|
||||
|
||||
- natural operator trees
|
||||
- straightforward control flow
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- can introduce repeated dispatch overhead
|
||||
|
||||
### Push-based execution
|
||||
|
||||
Child operators push data to parents or downstream consumers.
|
||||
|
||||
Strengths:
|
||||
|
||||
- natural for streaming or event-driven systems
|
||||
- can work well with pipeline fusion
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- control flow can be harder to reason about
|
||||
|
||||
Many systems combine these ideas rather than choosing only one.
|
||||
|
||||
---
|
||||
|
||||
## Pipelining vs materialization
|
||||
|
||||
### Pipelined execution
|
||||
|
||||
Operators pass intermediate results incrementally.
|
||||
|
||||
Strengths:
|
||||
|
||||
- low latency
|
||||
- less temporary storage in favorable cases
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- some operators still create barriers
|
||||
|
||||
### Materializing execution
|
||||
|
||||
An operator stores its entire output before the next operator consumes it.
|
||||
|
||||
Strengths:
|
||||
|
||||
- simpler boundaries
|
||||
- easier reuse of intermediates
|
||||
|
||||
Weaknesses:
|
||||
|
||||
- more memory and I/O cost
|
||||
- higher latency
|
||||
|
||||
---
|
||||
|
||||
## Blocking operators
|
||||
|
||||
Some operators are naturally blocking.
|
||||
|
||||
Examples:
|
||||
|
||||
- sort
|
||||
- some aggregates
|
||||
- some join strategies
|
||||
|
||||
These operators shape the real execution behavior of the plan because they force buffering or full-input processing before useful output appears.
|
||||
|
||||
---
|
||||
|
||||
## Practical mental model
|
||||
|
||||
Execution models are about runtime granularity and data flow.
|
||||
|
||||
If architecture asks "what kind of engine is this?", the execution model asks "how do operators actually run?"
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **April 1, 2026** -- First version created.
|
||||
153
hqew/007-storage-and-indexes.md
Normal file
153
hqew/007-storage-and-indexes.md
Normal file
@ -0,0 +1,153 @@
|
||||
# Storage and Indexes
|
||||
|
||||
A reference for how storage layout and indexing shape query execution.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
Storage is not just where data sits. It strongly influences which queries are cheap, which operators are natural, and what the optimizer can exploit.
|
||||
|
||||
Indexes matter because they trade extra write and storage cost for faster reads on selected access patterns.
|
||||
|
||||
---
|
||||
|
||||
## Row store vs column store
|
||||
|
||||
### Row store
|
||||
|
||||
Stores all fields of one row together.
|
||||
|
||||
Good for:
|
||||
|
||||
- point lookups
|
||||
- updates of whole records
|
||||
- transactional workloads
|
||||
|
||||
Weak for:
|
||||
|
||||
- scanning a few columns across many rows
|
||||
|
||||
### Column store
|
||||
|
||||
Stores values of the same column together.
|
||||
|
||||
Good for:
|
||||
|
||||
- analytical scans
|
||||
- compression
|
||||
- vectorized execution
|
||||
- reading only selected columns
|
||||
|
||||
Weak for:
|
||||
|
||||
- reconstructing many full records repeatedly
|
||||
|
||||
---
|
||||
|
||||
## Why storage layout matters
|
||||
|
||||
The storage layout affects:
|
||||
|
||||
- I/O volume
|
||||
- cache locality
|
||||
- compression opportunities
|
||||
- pushdown behavior
|
||||
- operator implementation strategy
|
||||
|
||||
So storage is a first-order architecture decision, not just a persistence detail.
|
||||
|
||||
---
|
||||
|
||||
## Common index types
|
||||
|
||||
### B-tree
|
||||
|
||||
A classic ordered index, good for:
|
||||
|
||||
- point lookups
|
||||
- range queries
|
||||
- ordered scans
|
||||
|
||||
### Hash index
|
||||
|
||||
Optimized for exact-match lookups.
|
||||
|
||||
Good for:
|
||||
|
||||
- equality predicates
|
||||
|
||||
Weak for:
|
||||
|
||||
- range queries
|
||||
|
||||
### LSM-based indexing
|
||||
|
||||
Common in modern write-heavy systems.
|
||||
|
||||
Good for:
|
||||
|
||||
- high write throughput
|
||||
- append-heavy workloads
|
||||
|
||||
Tradeoff:
|
||||
|
||||
- reads often need compaction-aware logic
|
||||
|
||||
### Inverted index
|
||||
|
||||
Maps terms to documents or postings.
|
||||
|
||||
Good for:
|
||||
|
||||
- text search
|
||||
- filtering over tokenized fields
|
||||
|
||||
### Vector index
|
||||
|
||||
Supports approximate nearest-neighbor search over embeddings.
|
||||
|
||||
Good for:
|
||||
|
||||
- semantic search
|
||||
- similarity retrieval
|
||||
|
||||
Tradeoff:
|
||||
|
||||
- often approximate rather than exact
|
||||
|
||||
---
|
||||
|
||||
## What indexes buy
|
||||
|
||||
Indexes can help the engine avoid full scans and reduce candidate sets before expensive operators run.
|
||||
|
||||
They are most valuable when:
|
||||
|
||||
- the predicate is selective
|
||||
- the access pattern repeats often
|
||||
- the engine can exploit the index directly
|
||||
|
||||
They are less valuable when:
|
||||
|
||||
- most rows are needed anyway
|
||||
- the predicate is too broad
|
||||
- maintaining the index is too expensive for the workload
|
||||
|
||||
---
|
||||
|
||||
## Practical mental model
|
||||
|
||||
Tables define what data exists.
|
||||
|
||||
Storage layout defines how that data is physically organized.
|
||||
|
||||
Indexes define shortcuts through that organization.
|
||||
|
||||
That is the simplest useful framing.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **April 1, 2026** -- First version created.
|
||||
Loading…
x
Reference in New Issue
Block a user