From d5bbc4886d145dec69584d5729d8c190d09fa41a Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Thu, 2 Apr 2026 09:15:43 +0200 Subject: [PATCH] Add note files about query engines for NoSQL and design questions --- ...012-query-engines-for-non-sql-databases.md | 80 +++++++++++++++++ hqew/013-query-engine-design-questions.md | 85 +++++++++++++++++++ 2 files changed, 165 insertions(+) create mode 100644 hqew/012-query-engines-for-non-sql-databases.md create mode 100644 hqew/013-query-engine-design-questions.md diff --git a/hqew/012-query-engines-for-non-sql-databases.md b/hqew/012-query-engines-for-non-sql-databases.md new file mode 100644 index 0000000..fd56400 --- /dev/null +++ b/hqew/012-query-engines-for-non-sql-databases.md @@ -0,0 +1,80 @@ +# Query Engines for Non-SQL Databases + +A reference for why query engines are broader than SQL. + +--- + +## Short answer + +Yes, a query engine can be built for non-SQL databases. + +SQL is only one possible query language. The broader pattern is: + +- a data model exists +- users need a declarative or structured way to ask for data +- the system needs planning and execution machinery to answer those requests + +So query engines are not inherently relational. + +--- + +## What changes + +What changes is the underlying algebra and operator set. + +For example: + +- relational engines center on tables, joins, filters, and aggregates +- graph engines center on nodes, edges, traversals, and pattern matching +- document engines center on nested objects, arrays, and field-path predicates +- rule engines center on facts, unification, recursion, and fixpoint evaluation + +The architecture may still look familiar, but the internal operators differ. + +--- + +## Examples + +You can meaningfully talk about query engines for: + +- document databases such as MongoDB +- search systems such as Lucene or Vespa +- vector databases such as Qdrant or Weaviate +- graph databases +- Datalog or rule engines + +What makes them query engines is not SQL syntax. It is that they accept structured requests and execute them efficiently over some data model. + +--- + +## What stays the same + +Even outside SQL, many systems still have: + +1. a query language or API +2. an internal representation of the request +3. optimization or rewrite steps +4. execution against indexes or stored data + +So the concept generalizes cleanly beyond relational databases. + +--- + +## Practical mental model + +SQL engines optimize relational algebra. + +Non-SQL engines optimize some other access model: + +- graph traversal +- text retrieval +- vector similarity +- logical derivation + +That is the main difference. + +--- + +## Changelog + +* **April 2, 2026** -- First version created. diff --git a/hqew/013-query-engine-design-questions.md b/hqew/013-query-engine-design-questions.md new file mode 100644 index 0000000..4e8d39c --- /dev/null +++ b/hqew/013-query-engine-design-questions.md @@ -0,0 +1,85 @@ +# Query Engine Design Questions + +A checklist note for thinking about what kind of query engine to build. + +--- + +## Short answer + +Before building a query engine, it helps to force explicit answers to a small set of design questions. + +Most architecture disagreements are really disagreements about workload, execution granularity, storage boundary, or correctness model. + +--- + +## Core questions + +### Workload + +- Is the workload mostly transactional, analytical, streaming, search, or logical inference? +- Are queries mostly point lookups, scans, aggregations, recursive rules, or top-k retrieval? + +### Data model + +- Is the data relational, document-oriented, graph-shaped, vector-based, or rule/fact based? +- What is the engine's core internal representation? + +### Unit of execution + +- Does the engine run row-at-a-time, batch-at-a-time, or over fully materialized relations? +- Are blocking operators common? + +### Storage boundary + +- Is execution tightly coupled to one storage engine? +- Or is there a source interface with pushdown capabilities? + +### Indexes + +- What access patterns deserve dedicated indexes? +- Are indexes exact, approximate, ordered, inverted, or vector-oriented? + +### Optimization + +- Is the optimizer mostly rule-based or cost-based? +- What statistics are available? + +### Distribution + +- Is one machine enough? +- If not, where are exchange boundaries, partitioning choices, and failure handling defined? + +### Semantics + +- Is the system exact, approximate, eventually consistent, ranked, or fixpoint-based? +- Does it support recursion, inference, or witness generation? + +--- + +## Why these questions matter + +These questions determine most of the major architecture choices: + +- row vs column +- iterator vs vectorized execution +- local vs distributed execution +- exact vs approximate search +- relational vs rule-based planning + +If those answers are unclear, architecture discussions tend to stay vague. + +--- + +## Practical use + +This note is best used as a design checklist. + +If a team can answer these questions cleanly, the likely engine shape becomes much easier to see. + +If it cannot, the project is probably still mixing together several different engine ideas. + +--- + +## Changelog + +* **April 2, 2026** -- First version created.