86 lines
2.2 KiB
Markdown
86 lines
2.2 KiB
Markdown
# Query Engine Design Questions
|
|
|
|
A checklist note for thinking about what kind of query engine to build.
|
|
|
|
---
|
|
|
|
## Short answer
|
|
|
|
Before building a query engine, it helps to force explicit answers to a small set of design questions.
|
|
|
|
Most architecture disagreements are really disagreements about workload, execution granularity, storage boundary, or correctness model.
|
|
|
|
---
|
|
|
|
## Core questions
|
|
|
|
### Workload
|
|
|
|
- Is the workload mostly transactional, analytical, streaming, search, or logical inference?
|
|
- Are queries mostly point lookups, scans, aggregations, recursive rules, or top-k retrieval?
|
|
|
|
### Data model
|
|
|
|
- Is the data relational, document-oriented, graph-shaped, vector-based, or rule/fact based?
|
|
- What is the engine's core internal representation?
|
|
|
|
### Unit of execution
|
|
|
|
- Does the engine run row-at-a-time, batch-at-a-time, or over fully materialized relations?
|
|
- Are blocking operators common?
|
|
|
|
### Storage boundary
|
|
|
|
- Is execution tightly coupled to one storage engine?
|
|
- Or is there a source interface with pushdown capabilities?
|
|
|
|
### Indexes
|
|
|
|
- What access patterns deserve dedicated indexes?
|
|
- Are indexes exact, approximate, ordered, inverted, or vector-oriented?
|
|
|
|
### Optimization
|
|
|
|
- Is the optimizer mostly rule-based or cost-based?
|
|
- What statistics are available?
|
|
|
|
### Distribution
|
|
|
|
- Is one machine enough?
|
|
- If not, where are exchange boundaries, partitioning choices, and failure handling defined?
|
|
|
|
### Semantics
|
|
|
|
- Is the system exact, approximate, eventually consistent, ranked, or fixpoint-based?
|
|
- Does it support recursion, inference, or witness generation?
|
|
|
|
---
|
|
|
|
## Why these questions matter
|
|
|
|
These questions determine most of the major architecture choices:
|
|
|
|
- row vs column
|
|
- iterator vs vectorized execution
|
|
- local vs distributed execution
|
|
- exact vs approximate search
|
|
- relational vs rule-based planning
|
|
|
|
If those answers are unclear, architecture discussions tend to stay vague.
|
|
|
|
---
|
|
|
|
## Practical use
|
|
|
|
This note is best used as a design checklist.
|
|
|
|
If a team can answer these questions cleanly, the likely engine shape becomes much easier to see.
|
|
|
|
If it cannot, the project is probably still mixing together several different engine ideas.
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
* **April 2, 2026** -- First version created.
|