Add note files about query engines for NoSQL and design questions

This commit is contained in:
Hassan Abedi 2026-04-02 09:15:43 +02:00
parent 40ccf7ae69
commit d5bbc4886d
2 changed files with 165 additions and 0 deletions

View File

@ -0,0 +1,80 @@
# Query Engines for Non-SQL Databases
A reference for why query engines are broader than SQL.
---
## Short answer
Yes, a query engine can be built for non-SQL databases.
SQL is only one possible query language. The broader pattern is:
- a data model exists
- users need a declarative or structured way to ask for data
- the system needs planning and execution machinery to answer those requests
So query engines are not inherently relational.
---
## What changes
What changes is the underlying algebra and operator set.
For example:
- relational engines center on tables, joins, filters, and aggregates
- graph engines center on nodes, edges, traversals, and pattern matching
- document engines center on nested objects, arrays, and field-path predicates
- rule engines center on facts, unification, recursion, and fixpoint evaluation
The architecture may still look familiar, but the internal operators differ.
---
## Examples
You can meaningfully talk about query engines for:
- document databases such as MongoDB
- search systems such as Lucene or Vespa
- vector databases such as Qdrant or Weaviate
- graph databases
- Datalog or rule engines
What makes them query engines is not SQL syntax. It is that they accept structured requests and execute them efficiently over some data model.
---
## What stays the same
Even outside SQL, many systems still have:
1. a query language or API
2. an internal representation of the request
3. optimization or rewrite steps
4. execution against indexes or stored data
So the concept generalizes cleanly beyond relational databases.
---
## Practical mental model
SQL engines optimize relational algebra.
Non-SQL engines optimize some other access model:
- graph traversal
- text retrieval
- vector similarity
- logical derivation
That is the main difference.
---
## Changelog
* **April 2, 2026** -- First version created.

View File

@ -0,0 +1,85 @@
# Query Engine Design Questions
A checklist note for thinking about what kind of query engine to build.
---
## Short answer
Before building a query engine, it helps to force explicit answers to a small set of design questions.
Most architecture disagreements are really disagreements about workload, execution granularity, storage boundary, or correctness model.
---
## Core questions
### Workload
- Is the workload mostly transactional, analytical, streaming, search, or logical inference?
- Are queries mostly point lookups, scans, aggregations, recursive rules, or top-k retrieval?
### Data model
- Is the data relational, document-oriented, graph-shaped, vector-based, or rule/fact based?
- What is the engine's core internal representation?
### Unit of execution
- Does the engine run row-at-a-time, batch-at-a-time, or over fully materialized relations?
- Are blocking operators common?
### Storage boundary
- Is execution tightly coupled to one storage engine?
- Or is there a source interface with pushdown capabilities?
### Indexes
- What access patterns deserve dedicated indexes?
- Are indexes exact, approximate, ordered, inverted, or vector-oriented?
### Optimization
- Is the optimizer mostly rule-based or cost-based?
- What statistics are available?
### Distribution
- Is one machine enough?
- If not, where are exchange boundaries, partitioning choices, and failure handling defined?
### Semantics
- Is the system exact, approximate, eventually consistent, ranked, or fixpoint-based?
- Does it support recursion, inference, or witness generation?
---
## Why these questions matter
These questions determine most of the major architecture choices:
- row vs column
- iterator vs vectorized execution
- local vs distributed execution
- exact vs approximate search
- relational vs rule-based planning
If those answers are unclear, architecture discussions tend to stay vague.
---
## Practical use
This note is best used as a design checklist.
If a team can answer these questions cleanly, the likely engine shape becomes much easier to see.
If it cannot, the project is probably still mixing together several different engine ideas.
---
## Changelog
* **April 2, 2026** -- First version created.