Add note files about query engines for NoSQL and design questions
This commit is contained in:
parent
40ccf7ae69
commit
d5bbc4886d
80
hqew/012-query-engines-for-non-sql-databases.md
Normal file
80
hqew/012-query-engines-for-non-sql-databases.md
Normal file
@ -0,0 +1,80 @@
|
||||
# Query Engines for Non-SQL Databases
|
||||
|
||||
A reference for why query engines are broader than SQL.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
Yes, a query engine can be built for non-SQL databases.
|
||||
|
||||
SQL is only one possible query language. The broader pattern is:
|
||||
|
||||
- a data model exists
|
||||
- users need a declarative or structured way to ask for data
|
||||
- the system needs planning and execution machinery to answer those requests
|
||||
|
||||
So query engines are not inherently relational.
|
||||
|
||||
---
|
||||
|
||||
## What changes
|
||||
|
||||
What changes is the underlying algebra and operator set.
|
||||
|
||||
For example:
|
||||
|
||||
- relational engines center on tables, joins, filters, and aggregates
|
||||
- graph engines center on nodes, edges, traversals, and pattern matching
|
||||
- document engines center on nested objects, arrays, and field-path predicates
|
||||
- rule engines center on facts, unification, recursion, and fixpoint evaluation
|
||||
|
||||
The architecture may still look familiar, but the internal operators differ.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
You can meaningfully talk about query engines for:
|
||||
|
||||
- document databases such as MongoDB
|
||||
- search systems such as Lucene or Vespa
|
||||
- vector databases such as Qdrant or Weaviate
|
||||
- graph databases
|
||||
- Datalog or rule engines
|
||||
|
||||
What makes them query engines is not SQL syntax. It is that they accept structured requests and execute them efficiently over some data model.
|
||||
|
||||
---
|
||||
|
||||
## What stays the same
|
||||
|
||||
Even outside SQL, many systems still have:
|
||||
|
||||
1. a query language or API
|
||||
2. an internal representation of the request
|
||||
3. optimization or rewrite steps
|
||||
4. execution against indexes or stored data
|
||||
|
||||
So the concept generalizes cleanly beyond relational databases.
|
||||
|
||||
---
|
||||
|
||||
## Practical mental model
|
||||
|
||||
SQL engines optimize relational algebra.
|
||||
|
||||
Non-SQL engines optimize some other access model:
|
||||
|
||||
- graph traversal
|
||||
- text retrieval
|
||||
- vector similarity
|
||||
- logical derivation
|
||||
|
||||
That is the main difference.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **April 2, 2026** -- First version created.
|
||||
85
hqew/013-query-engine-design-questions.md
Normal file
85
hqew/013-query-engine-design-questions.md
Normal file
@ -0,0 +1,85 @@
|
||||
# Query Engine Design Questions
|
||||
|
||||
A checklist note for thinking about what kind of query engine to build.
|
||||
|
||||
---
|
||||
|
||||
## Short answer
|
||||
|
||||
Before building a query engine, it helps to force explicit answers to a small set of design questions.
|
||||
|
||||
Most architecture disagreements are really disagreements about workload, execution granularity, storage boundary, or correctness model.
|
||||
|
||||
---
|
||||
|
||||
## Core questions
|
||||
|
||||
### Workload
|
||||
|
||||
- Is the workload mostly transactional, analytical, streaming, search, or logical inference?
|
||||
- Are queries mostly point lookups, scans, aggregations, recursive rules, or top-k retrieval?
|
||||
|
||||
### Data model
|
||||
|
||||
- Is the data relational, document-oriented, graph-shaped, vector-based, or rule/fact based?
|
||||
- What is the engine's core internal representation?
|
||||
|
||||
### Unit of execution
|
||||
|
||||
- Does the engine run row-at-a-time, batch-at-a-time, or over fully materialized relations?
|
||||
- Are blocking operators common?
|
||||
|
||||
### Storage boundary
|
||||
|
||||
- Is execution tightly coupled to one storage engine?
|
||||
- Or is there a source interface with pushdown capabilities?
|
||||
|
||||
### Indexes
|
||||
|
||||
- What access patterns deserve dedicated indexes?
|
||||
- Are indexes exact, approximate, ordered, inverted, or vector-oriented?
|
||||
|
||||
### Optimization
|
||||
|
||||
- Is the optimizer mostly rule-based or cost-based?
|
||||
- What statistics are available?
|
||||
|
||||
### Distribution
|
||||
|
||||
- Is one machine enough?
|
||||
- If not, where are exchange boundaries, partitioning choices, and failure handling defined?
|
||||
|
||||
### Semantics
|
||||
|
||||
- Is the system exact, approximate, eventually consistent, ranked, or fixpoint-based?
|
||||
- Does it support recursion, inference, or witness generation?
|
||||
|
||||
---
|
||||
|
||||
## Why these questions matter
|
||||
|
||||
These questions determine most of the major architecture choices:
|
||||
|
||||
- row vs column
|
||||
- iterator vs vectorized execution
|
||||
- local vs distributed execution
|
||||
- exact vs approximate search
|
||||
- relational vs rule-based planning
|
||||
|
||||
If those answers are unclear, architecture discussions tend to stay vague.
|
||||
|
||||
---
|
||||
|
||||
## Practical use
|
||||
|
||||
This note is best used as a design checklist.
|
||||
|
||||
If a team can answer these questions cleanly, the likely engine shape becomes much easier to see.
|
||||
|
||||
If it cannot, the project is probably still mixing together several different engine ideas.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **April 2, 2026** -- First version created.
|
||||
Loading…
x
Reference in New Issue
Block a user