Add notes on query engine architecture and placement

This commit is contained in:
Hassan Abedi 2026-03-30 10:45:39 +02:00
parent dc343be7ad
commit 4f1e247043
8 changed files with 512 additions and 82 deletions

View File

@ -9,33 +9,37 @@
### Algorithm Variants
| Variant | Output Size | Speed per Step | Satisfaction Check | Used By | Best For |
|:---------------|:------------|:---------------|:-------------------|:------------------|:----------------------------------------------|
| Oblivious | Largest | Fastest | None | — | Prototyping; when output size doesn't matter |
| Skolem | Medium | Fast | Implicit (Skolem terms) | VLog, Graal | Parallelism; data exchange; deterministic runs |
| Restricted | Smallest | Slower | Explicit (homomorphism check) | RDFox, VLog | Production use; query performance over large KGs |
| Core | Most compact | Slowest | Full core computation | — (theoretical) | Theoretical analysis; minimum representations |
| Variant | Output Size | Speed per Step | Satisfaction Check | Used By | Best For |
|:-----------|:-------------|:---------------|:------------------------------|:----------------|:-------------------------------------------------|
| Oblivious | Largest | Fastest | None | — | Prototyping; when output size doesn't matter |
| Skolem | Medium | Fast | Implicit (Skolem terms) | VLog, Graal | Parallelism; data exchange; deterministic runs |
| Restricted | Smallest | Slower | Explicit (homomorphism check) | RDFox, VLog | Production use; query performance over large KGs |
| Core | Most compact | Slowest | Full core computation | — (theoretical) | Theoretical analysis; minimum representations |
#### Variant Notes
**Oblivious Chase** fires every applicable rule without checking whether the conclusion already holds. Simple to implement but produces redundant facts and is the hardest to guarantee termination for.
**Oblivious Chase** fires every applicable rule without checking whether the conclusion already holds. Simple to implement but produces redundant
facts and is the hardest to guarantee termination for.
**Skolem Chase** uses deterministic Skolem functions to generate nulls — the same rule trigger always produces the same null. This makes it idempotent and easy to parallelize, with no need to track which rules have already fired.
**Skolem Chase** uses deterministic Skolem functions to generate nulls — the same rule trigger always produces the same null. This makes it idempotent
and easy to parallelize, with no need to track which rules have already fired.
**Restricted Chase** checks before firing whether the rule head is already satisfied via a homomorphism into the current database. Never adds redundant facts, producing the smallest universal solution. The homomorphism check is NP-hard in general but fast in practice with good indexing.
**Restricted Chase** checks before firing whether the rule head is already satisfied via a homomorphism into the current database. Never adds
redundant facts, producing the smallest universal solution. The homomorphism check is NP-hard in general but fast in practice with good indexing.
**Core Chase** goes further than the restricted chase, computing the logical core — the most compressed possible universal solution. Theoretically important but computationally expensive; not used in production engines.
**Core Chase** goes further than the restricted chase, computing the logical core — the most compressed possible universal solution. Theoretically
important but computationally expensive; not used in production engines.
### Chase Query Engines
| Engine | Organization | Chase Variant | Home Page |
|:---------|:-------------------------------|:-------------------|:-------------------------------------------------------------------------------------|
| RDFox | Oxford Semantic Technologies | Restricted | [oxfordsemantic.tech/rdfox](https://oxfordsemantic.tech/rdfox) |
| VLog | TU Dresden / VU Amsterdam | Restricted + Skolem | [github.com/karmaresearch/vlog](https://github.com/karmaresearch/vlog) |
| Graal | LIRMM, Montpellier | Skolem + Oblivious | [graphik-team.github.io/graal](https://graphik-team.github.io/graal) |
| Vadalog | University of Oxford / TU Wien | Warded (Restricted variant) | [vadalog-system.com](https://vadalog-system.com) |
| PDQ | University of Oxford | Skolem | [github.com/michaelbenedict/pdq](https://github.com/michaelbenedict/pdq) |
| Llunatic | University of Genova | Skolem | [github.com/donatellosantoro/Llunatic](https://github.com/donatellosantoro/Llunatic) |
| Engine | Organization | Chase Variant | Home Page |
|:---------|:-------------------------------|:----------------------------|:-------------------------------------------------------------------------------------|
| RDFox | Oxford Semantic Technologies | Restricted | [oxfordsemantic.tech/rdfox](https://oxfordsemantic.tech/rdfox) |
| VLog | TU Dresden / VU Amsterdam | Restricted + Skolem | [github.com/karmaresearch/vlog](https://github.com/karmaresearch/vlog) |
| Graal | LIRMM, Montpellier | Skolem + Oblivious | [graphik-team.github.io/graal](https://graphik-team.github.io/graal) |
| Vadalog | University of Oxford / TU Wien | Warded (Restricted variant) | [vadalog-system.com](https://vadalog-system.com) |
| PDQ | University of Oxford | Skolem | [github.com/michaelbenedict/pdq](https://github.com/michaelbenedict/pdq) |
| Llunatic | University of Genova | Skolem | [github.com/donatellosantoro/Llunatic](https://github.com/donatellosantoro/Llunatic) |
## Changelog

View File

@ -7,18 +7,23 @@ A reference for the core concepts of first-order logic and database theory.
## Building Blocks
### Constants
Specific known values — real things in your database.
```
Alice, Bob, Engineering, 42
```
### Variables
Placeholders for unknown values, usually written in uppercase or with `?`.
```
X, Y, ?person, ?dept
```
### Terms
A term is either a constant or a variable. Anything that can fill an argument slot in a predicate.
---
@ -26,6 +31,7 @@ A term is either a constant or a variable. Anything that can fill an argument sl
## Predicates and Atoms
A **predicate** is a named relation applied to a list of terms:
```
Employee(X, Y) -- predicate "Employee", arity 2
Likes(X, Y, Z) -- predicate "Likes", arity 3
@ -35,6 +41,7 @@ Person(Alice) -- predicate "Person", arity 1
The number of arguments a predicate takes is its **arity**.
An **atom** is a predicate applied to specific terms. When all terms are constants it is called a **ground atom** — a concrete fact in the database:
```
Employee(Alice, Engineering) -- ground atom
```
@ -49,45 +56,49 @@ A **rule** is an if-then statement built from atoms, split into two parts:
body → head
```
| Part | Also Called | Meaning |
|:-----|:------------|:--------|
| Body | Antecedent, LHS | Conditions that must hold |
| Part | Also Called | Meaning |
|:-----|:----------------|:--------------------------------|
| Body | Antecedent, LHS | Conditions that must hold |
| Head | Consequent, RHS | What must be true if body holds |
Example:
```
Employee(X, Y), Department(Y) → ∃Z. ManagedBy(Y, Z)
```
Variables in the head that do not appear in the body are **existential variables** — they represent unknown values that must exist, but whose identity is not known.
Variables in the head that do not appear in the body are **existential variables** — they represent unknown values that must exist, but whose identity
is not known.
---
## Quantifiers
| Symbol | Name | Meaning | Example |
|:-------|:-----|:--------|:--------|
| `∀` | Universal | "For all..." | `∀X. Person(X) → Mortal(X)` |
| `∃` | Existential | "There exists some..." | `∀X. Person(X) → ∃Y. MotherOf(X, Y)` |
| Symbol | Name | Meaning | Example |
|:-------|:------------|:-----------------------|:-------------------------------------|
| `∀` | Universal | "For all..." | `∀X. Person(X) → Mortal(X)` |
| `∃` | Existential | "There exists some..." | `∀X. Person(X) → ∃Y. MotherOf(X, Y)` |
---
## Connectives
| Symbol | Name | Meaning |
|:-------|:-----|:--------|
| `∧` | Conjunction | AND |
| `` | Disjunction | OR |
| `¬` | Negation | NOT |
| `→` | Implication | IF...THEN |
| `↔` | Biconditional | IF AND ONLY IF |
| Symbol | Name | Meaning |
|:-------|:--------------|:---------------|
| `∧` | Conjunction | AND |
| `` | Disjunction | OR |
| `¬` | Negation | NOT |
| `→` | Implication | IF...THEN |
| `↔` | Biconditional | IF AND ONLY IF |
---
## Key Concepts
### Substitution
A mapping from variables to terms. Written as `σ`:
```
σ = { X → Alice, Y → Engineering }
@ -95,55 +106,64 @@ Employee(X, Y) under σ = Employee(Alice, Engineering)
```
### Homomorphism
A substitution that maps one set of atoms into another, preserving structure. Used in the chase to check whether a rule head is already satisfied in the database.
A substitution that maps one set of atoms into another, preserving structure. Used in the chase to check whether a rule head is already satisfied in
the database.
### Ground Instance
An atom or rule with all variables replaced by constants — no variables remaining.
### Herbrand Universe
The set of all ground terms constructable from the constants and function symbols in a formula. Represents the "world" of all possible values.
### Skolem Term / Skolem Function
A named placeholder for an existentially quantified variable, constructed deterministically from the values that triggered it:
```
∀X. Person(X) → ∃Y. MotherOf(X, Y)
Skolemized: MotherOf(X, mother_of(X))
```
The Skolem term `mother_of(X)` means "the mother of X, whoever that is." Same input always produces the same term.
---
## Types of Dependencies (Rules)
| Type | Abbreviation | Form | Meaning |
|:-----|:-------------|:-----|:--------|
| Tuple-generating dependency | TGD | `body → ∃z. head` | If body holds, some new tuple must exist |
| Equality-generating dependency | EGD | `body → x = y` | If body holds, two values must be equal |
| Functional dependency | FD | Special EGD | A set of attributes uniquely determines another |
| Full dependency | — | TGD with no existentials | Head variables all appear in body |
| Type | Abbreviation | Form | Meaning |
|:-------------------------------|:-------------|:-------------------------|:------------------------------------------------|
| Tuple-generating dependency | TGD | `body → ∃z. head` | If body holds, some new tuple must exist |
| Equality-generating dependency | EGD | `body → x = y` | If body holds, two values must be equal |
| Functional dependency | FD | Special EGD | A set of attributes uniquely determines another |
| Full dependency | — | TGD with no existentials | Head variables all appear in body |
---
## Rule Classes and Termination
| Rule Class | Existentials | Termination Guarantee | Notes |
|:-----------|:-------------|:----------------------|:------|
| Datalog | No | Always | Core of logic programming |
| Weakly acyclic | Yes | Yes | No cyclic value propagation through existentials |
| Guarded | Yes | No (but decidable) | Existential vars "guarded" by a body atom |
| Frontier-one | Yes | No (but decidable) | At most one frontier variable per rule |
| General TGDs | Yes | Undecidable | No restrictions |
| Rule Class | Existentials | Termination Guarantee | Notes |
|:---------------|:-------------|:----------------------|:-------------------------------------------------|
| Datalog | No | Always | Core of logic programming |
| Weakly acyclic | Yes | Yes | No cyclic value propagation through existentials |
| Guarded | Yes | No (but decidable) | Existential vars "guarded" by a body atom |
| Frontier-one | Yes | No (but decidable) | At most one frontier variable per rule |
| General TGDs | Yes | Undecidable | No restrictions |
---
## Datalog
Datalog is a logic programming language and the most important rule class for the chase. It is a restriction of first-order logic with no function symbols and no existential variables in rule heads.
Datalog is a logic programming language and the most important rule class for the chase. It is a restriction of first-order logic with no function
symbols and no existential variables in rule heads.
### Syntax
A Datalog program consists of:
- **Facts** — ground atoms representing known data
- **Rules** — if-then statements deriving new facts from existing ones
@ -160,18 +180,19 @@ Colleague(X, Y) :- WorksIn(X, D), WorksIn(Y, D), X ≠ Y.
### Key Properties
| Property | Value |
|:---------|:------|
| Existentials in head | No |
| Function symbols | No |
| Negation | Stratified only |
| Termination | Always |
| Data complexity | PTIME |
| Evaluation strategy | Bottom-up (forward chaining) or top-down (backward chaining) |
| Property | Value |
|:---------------------|:-------------------------------------------------------------|
| Existentials in head | No |
| Function symbols | No |
| Negation | Stratified only |
| Termination | Always |
| Data complexity | PTIME |
| Evaluation strategy | Bottom-up (forward chaining) or top-down (backward chaining) |
### Evaluation
Datalog is evaluated by computing the **least fixed point** — repeatedly applying all rules until no new facts are derived. This is exactly the chase with no existentials:
Datalog is evaluated by computing the **least fixed point** — repeatedly applying all rules until no new facts are derived. This is exactly the chase
with no existentials:
```
Iteration 0: { Employee(alice, engineering), Employee(bob, marketing), Department(engineering) }
@ -182,7 +203,8 @@ Iteration 3: nothing new → stop
### Datalog vs SQL
Datalog can express **recursive queries** naturally, which SQL cannot without special extensions (like recursive CTEs). For example, computing transitive closure (all ancestors of a person) is trivial in Datalog:
Datalog can express **recursive queries** naturally, which SQL cannot without special extensions (like recursive CTEs). For example, computing
transitive closure (all ancestors of a person) is trivial in Datalog:
```
Ancestor(X, Y) :- Parent(X, Y).
@ -191,26 +213,29 @@ Ancestor(X, Y) :- Parent(X, Z), Ancestor(Z, Y).
### Datalog and the Chase
Pure Datalog evaluation is a restricted chase with no existential variables — all chase variants behave identically. The fixed point is always reached in at most O(n^k) steps.
Pure Datalog evaluation is a restricted chase with no existential variables — all chase variants behave identically. The fixed point is always reached
in at most O(n^k) steps.
---
## Geometric Logic
Geometric logic is a fragment of first-order logic that is particularly well-suited to the chase. It is the theoretical foundation for most modern chase-based reasoning engines.
Geometric logic is a fragment of first-order logic that is particularly well-suited to the chase. It is the theoretical foundation for most modern
chase-based reasoning engines.
### Definition
A formula is **geometric** if it uses only:
| Allowed | Not Allowed |
|:--------|:-----------|
| Conjunction `∧` | Universal quantifier `∀` in head |
| Disjunction `` | Negation `¬` |
| Existential quantifier `∃` | Implication `→` in head |
| Equality `=` | Infinite conjunctions in head |
| Allowed | Not Allowed |
|:---------------------------|:---------------------------------|
| Conjunction `∧` | Universal quantifier `∀` in head |
| Disjunction `` | Negation `¬` |
| Existential quantifier `∃` | Implication `→` in head |
| Equality `=` | Infinite conjunctions in head |
A **geometric sequent** (rule) has the form:
```
φ ⊢ ψ
@ -234,9 +259,11 @@ Employee(X) ∧ X = Y ⊢ Employee(Y)
### Why Geometric Logic Matters
Geometric logic has a special property: if a geometric formula is true in a model, it remains true in any **extension** of that model (adding more facts never makes it false). This is called **monotonicity** and is exactly what makes the chase work — the chase only adds facts, never removes them.
Geometric logic has a special property: if a geometric formula is true in a model, it remains true in any **extension** of that model (adding more
facts never makes it false). This is called **monotonicity** and is exactly what makes the chase work — the chase only adds facts, never removes them.
This property means:
- The chase directly computes models of geometric theories
- Every geometric theory has a **canonical model** — the one the chase builds
- Query answers over the chase result are guaranteed correct for any model of the theory
@ -260,7 +287,8 @@ Geometric sequent: Employee(X) ⊢ ∃Y. WorksIn(X, Y)
TGD: Employee(X) → ∃Y. WorksIn(X, Y)
```
The chase is the standard procedure for building models of geometric theories. Given a set of geometric axioms and an initial database, the chase constructs the minimal model satisfying all axioms — which is exactly the universal solution used for query answering.
The chase is the standard procedure for building models of geometric theories. Given a set of geometric axioms and an initial database, the chase
constructs the minimal model satisfying all axioms — which is exactly the universal solution used for query answering.
---

View File

@ -26,7 +26,8 @@ So the right mental model is:
### Why It Fits Part of the Problem
The notes describe Geolog as infrastructure with persistent, append-only storage and an interactive REPL, not just as a one-shot solver. Datomic's immutable fact model lines up well with that storage story.
The notes describe Geolog as infrastructure with persistent, append-only storage and an interactive REPL, not just as a one-shot solver. Datomic's
immutable fact model lines up well with that storage story.
Good candidates to store in Datomic:
@ -56,7 +57,8 @@ Geolog needs chase behavior that can:
* branch when a rule concludes a disjunction
* merge terms when equations force equality
Those are central to the chase-based model described in the notes, and they are not features Datomic gives you as a database. Datomic gives you a temporal fact store plus Datalog query tools; it does not give you a geometric-logic engine.
Those are central to the chase-based model described in the notes, and they are not features Datomic gives you as a database. Datomic gives you a
temporal fact store plus Datalog query tools; it does not give you a geometric-logic engine.
That means a Datomic-based Geolog would still need its own:
@ -72,7 +74,8 @@ Inference: Datomic is a reasonable substrate for storing models and derivation h
### Licensing and Stack Implications
As of March 26, 2026, Datomic's official docs say that all editions are free and that the binaries are licensed under Apache 2.0, while Datomic's own FAQ says it is not open source because the source is not published.
As of March 26, 2026, Datomic's official docs say that all editions are free and that the binaries are licensed under Apache 2.0, while Datomic's own
FAQ says it is not open source because the source is not published.
Official sources:
@ -96,12 +99,14 @@ That source-unavailable dependency has real engineering consequences:
* fully reproducible "build everything from source" workflows become weaker
* some contributors will reject the stack on principle because it is not fully open
So the licensing issue is not "can it legally coexist with open parts of the stack?" The answer there is mostly **yes**. The more important question is strategic:
So the licensing issue is not "can it legally coexist with open parts of the stack?" The answer there is mostly **yes**. The more important question
is strategic:
* does Geolog want a fully open, self-hostable, inspectable foundation
* or is it acceptable to place the storage layer on a free but closed-source binary
If Geolog wants the strongest possible open-source story, Datomic is a poor fit. If Geolog mainly wants a strong immutable fact store and accepts vendor dependence, Datomic remains viable.
If Geolog wants the strongest possible open-source story, Datomic is a poor fit. If Geolog mainly wants a strong immutable fact store and accepts
vendor dependence, Datomic remains viable.
---
@ -116,4 +121,4 @@ For Geolog as described in these notes:
## Changelog
* **Mar 26, 2026** -- Added note on whether Datomic could serve as a backend for Geolog, including licensing implications.
* **Mar 26, 2026** -- First version created.

View File

@ -3,7 +3,8 @@
This note summarizes the intermediate representation currently implemented in the Geolog repo on branch `ir-draft1`:
https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads
All file paths below are repo-relative, not local workspace paths. The main sources are `geolog-lang/src/Geolog/IR.hs` and `geolog-lang/src/Geolog/Lower.hs`.
All file paths below are repo-relative, not local workspace paths. The main sources are `geolog-lang/src/Geolog/IR.hs` and
`geolog-lang/src/Geolog/Lower.hs`.
## Where the IR sits
@ -40,7 +41,8 @@ The lowering entry point is `geolog-lang/src/Geolog/Lower.hs`, via `lowerTop`, w
The IR is intentionally relational. It turns theory structure into tables and rules.
- `Path = [QName]`: a hierarchical name like `["M","mul"]`, pretty-printed as `M.mul`.
- `ColType` is the type of a column. `EntityType Path` means the column refers to rows from another entity/table, `PrimType PrimInt | PrimString` covers builtin scalars, and `Tuple [(QName, ColType)]` supports structured values.
- `ColType` is the type of a column. `EntityType Path` means the column refers to rows from another entity/table, `PrimType PrimInt | PrimString`
covers builtin scalars, and `Tuple [(QName, ColType)]` supports structured values.
- `Table` contains `columns :: [ColType]` and `primaryKey :: Maybe [Int]`.
- `Term` supports literals, variables, projections like `x.field`, and record construction.
- `Atom` contains `table :: Path`, `rowId :: Maybe Term`, and `values :: Map Int Term`.
@ -86,7 +88,8 @@ This logic is in `finishTheory` in `geolog-lang/src/Geolog/Lower.hs`.
Lowering is not just flattening syntax. It also turns shared context into explicit columns.
`liftOnShared` copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like `D.mul` contain both `M.car` arguments and displayed-car columns.
`liftOnShared` copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like
`D.mul` contain both `M.car` arguments and displayed-car columns.
See `liftOnShared` in `geolog-lang/src/Geolog/Lower.hs`.
@ -158,7 +161,8 @@ The implemented IR is best understood as a relationalized, flattened theory:
- references become foreign-key-style laws
- total functions become tables plus totality laws
So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped form that later evaluation code can consume.
So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped form that later evaluation code can
consume.
## Status
@ -181,8 +185,14 @@ What still looks missing:
So the current IR is best seen as a lowered theory IR, not yet a full chase-runtime IR.
If the project keeps this design, the natural next step is probably a second layer above `FlatTheory` for runtime execution state. That layer would hold current facts, branch identity, fresh-name allocation, equality information, and evaluation metadata.
If the project keeps this design, the natural next step is probably a second layer above `FlatTheory` for runtime execution state. That layer would
hold current facts, branch identity, fresh-name allocation, equality information, and evaluation metadata.
## Note
`docs/lecture3.tex` has a `Query IR` section header but no content yet, so the code above is currently the most concrete description of the actual intermediate representation.
`docs/lecture3.tex` has a `Query IR` section header but no content yet, so the code above is currently the most concrete description of the actual
intermediate representation.
## Changelog
* **Mar 30, 2026** -- First version created.

12
scratches/make_figures.sh Normal file
View File

@ -0,0 +1,12 @@
#!/bin/bash
# You need to have Graphviz installed to run this script
# On Debian-based OSes, you can install it using: sudo apt-get install graphviz
# Directory containing .dot files (with default value)
ASSET_DIR=${1:-"."}
# Make figures from .dot files
for f in "${ASSET_DIR}"/*.dot; do
dot -Tsvg "$f" -o "${f%.dot}.svg"
done

View File

@ -0,0 +1,94 @@
# Query Engine in the Geolog Big Picture
This is a scratch note based on the architecture diagram and the current IR note.
## Short answer
The query engine fits below the IR, not above it.
The top part of the system is the language pipeline:
1. concrete syntax
2. parser
3. abstract syntax
4. elaborator
5. elaborated syntax
6. lowering
7. intermediate representation
After that point, the system stops being mostly front-end/compiler work and starts being backend/runtime work.
That means the query engine should be one of the first consumers of the IR.
## What that means in the diagram
The cleanest reading of the diagram is:
- parser, elaborator, and lowering produce the IR
- the IR is the contract between the front end and the execution backends
- the query engine sits under the IR
- backend-specific implementations then sit under that
So a useful architecture shape would be:
1. `Lowering`
2. `Intermediate Representation`
3. `IR Execution / Query Engine`
4. backend implementations:
- in-memory execution
- Postgres execution
- Rust storage/query execution
## Two ways to think about the query engine
There are really two possible meanings.
### 1. Narrow meaning
The query engine is just the thing that runs queries against a backend.
In that version:
- the in-memory backend has its own query engine
- the Postgres backend has its own query engine
- the Rust backend has its own query engine
The IR is compiled into whatever each backend needs.
### 2. Broad meaning
The query engine is the main execution layer for the IR.
In that version it is responsible for more than just user queries. It may also handle:
- law checking
- chase steps
- fixpoint iteration
- planning or rewriting
- backend-specific lowering of queries
This broader meaning is probably closer to what Geolog will eventually need.
## Practical conclusion
Right now, the best big-picture placement is:
- the query engine belongs after lowering
- it consumes IR
- it is part of the execution layer
- it may have multiple backend implementations
So the IR should be treated as the interface, and the query engine should be treated as the first major runtime component that uses that interface.
## Questions for the team
1. Is the query engine meant to be one shared execution layer over the IR, or a thin abstraction over several backend-specific engines?
2. Is `FlatTheory` supposed to be enough for execution, or do we expect a second runtime IR above it for chase state, branches, and equality?
3. Should law checking and user querying be handled by the same engine, or by separate components that both consume the IR?
4. Do we expect the first real executor to be in-memory, Postgres-backed, or Rust-native?
5. Is Postgres only a storage/query backend, or is it intended to become a serious execution target for law checking and chase steps too?
6. What parts of execution must be backend-independent: query planning, law translation, chase scheduling, provenance, branch management?
7. Do existential witness generation, disjunctive branching, and equality merging belong inside the query engine, or in a separate chase/runtime layer above it?
8. Should the query engine operate directly on relational tables, or should it also understand higher-level Geolog structure like paths, theories, and dependencies?
9. What is the expected contract between Haskell lowering and Rust execution: raw IR data only, or also precompiled queries/plans?
10. What is the earliest milestone for the engine: run queries, check laws, perform simple fixpoint evaluation, or support full chase behavior?

View File

@ -0,0 +1,130 @@
digraph GeologQueryEngineBroad {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = LR
ranksep = 0.9;
nodesep = 0.7;
splines = true;
compound = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
labeldistance = 2.0,
penwidth = 1.2
]
subgraph cluster_frontend {
label = "Geolog Front End"
style = "rounded"
color = "#666666"
fontcolor = "#333333"
bgcolor = "#FAFAFA"
margin = 20
subgraph cluster_language {
label = "Language Pipeline"
style = "dashed"
color = "#9C27B0"
fontcolor = "#7B1FA2"
margin = 14
concrete [label = "Concrete Syntax", fillcolor = "#F3E5F5", color = "#9C27B0"]
parser [label = "Parser", fillcolor = "#F3E5F5", color = "#9C27B0"]
abstract [label = "Abstract Syntax", fillcolor = "#F3E5F5", color = "#9C27B0"]
elaborator [label = "Elaborator", fillcolor = "#F3E5F5", color = "#9C27B0"]
typed [label = "Elaborated Syntax\n(with types)", fillcolor = "#F3E5F5", color = "#9C27B0"]
lowering [label = "Lowering", fillcolor = "#F3E5F5", color = "#9C27B0"]
ir [label = "Intermediate Representation\n(shared execution contract)", fillcolor = "#F3E5F5", color = "#9C27B0"]
}
subgraph cluster_tools {
label = "Developer Tooling"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 14
lsp [label = "LSP\n(parsing + elaboration diagnostics)", fillcolor = "#E8F5E9", color = "#4CAF50"]
}
}
subgraph cluster_runtime {
label = "Shared Runtime / Execution Layer"
style = "rounded"
color = "#666666"
fontcolor = "#333333"
bgcolor = "#FAFAFA"
margin = 20
subgraph cluster_engine {
label = "IR Execution Engine"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 14
engine [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Shared Query Engine</b></td></tr>
<tr><td align="left" balign="left">• user queries</td></tr>
<tr><td align="left" balign="left">• law checking</td></tr>
<tr><td align="left" balign="left">• chase / fixpoint steps</td></tr>
<tr><td align="left" balign="left">• planning / rewriting</td></tr>
</table>>, fillcolor = "#E8F5E9", color = "#4CAF50", shape = box]
runtime_ir [label = "Runtime State\n(branches, witnesses,\nequality, provenance)", fillcolor = "#E8F5E9", color = "#4CAF50"]
}
subgraph cluster_adapters {
label = "Backend Adapters"
style = "dashed"
color = "#FF9800"
fontcolor = "#F57C00"
margin = 14
inmem_adapter [label = "In-Memory Adapter", fillcolor = "#FFF3E0", color = "#FF9800"]
pg_adapter [label = "Postgres Adapter", fillcolor = "#FFF3E0", color = "#FF9800"]
rust_adapter [label = "Rust Adapter", fillcolor = "#FFF3E0", color = "#FF9800"]
ffi [label = "FFI Haskell to Rust\n(IR marshalling)", fillcolor = "#FFF3E0", color = "#FF9800"]
}
}
subgraph cluster_backends {
label = "Execution Targets"
style = "dashed"
color = "#888888"
fontcolor = "#555555"
margin = 18
inmem_db [label = "In-Memory DB", fillcolor = "#ECEFF1", color = "#607D8B"]
pg_db [label = "Postgres DB", fillcolor = "#ECEFF1", color = "#607D8B"]
rust_store [label = "Rust Storage / Query Layer", fillcolor = "#ECEFF1", color = "#607D8B"]
}
concrete -> parser [color = "#9C27B0"]
parser -> abstract [color = "#9C27B0"]
abstract -> elaborator [color = "#9C27B0"]
elaborator -> typed [color = "#9C27B0"]
typed -> lowering [color = "#9C27B0"]
lowering -> ir [color = "#9C27B0"]
parser -> lsp [style = "dashed", color = "#4CAF50", label = "syntax info"]
elaborator -> lsp [style = "dashed", color = "#4CAF50", label = "type info"]
ir -> engine [color = "#4CAF50", label = "execute IR"]
engine -> runtime_ir [style = "dashed", color = "#4CAF50", label = "maintains"]
engine -> inmem_adapter [color = "#FF9800"]
engine -> pg_adapter [color = "#FF9800"]
engine -> ffi [color = "#FF9800"]
ffi -> rust_adapter [color = "#FF9800"]
inmem_adapter -> inmem_db [color = "#607D8B"]
pg_adapter -> pg_db [color = "#607D8B"]
rust_adapter -> rust_store [color = "#607D8B"]
}

View File

@ -0,0 +1,147 @@
digraph GeologQueryEngineNarrow {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = LR
ranksep = 0.9;
nodesep = 0.7;
splines = true;
compound = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
labeldistance = 2.0,
penwidth = 1.2
]
subgraph cluster_frontend {
label = "Geolog Front End"
style = "rounded"
color = "#666666"
fontcolor = "#333333"
bgcolor = "#FAFAFA"
margin = 20
subgraph cluster_language {
label = "Language Pipeline"
style = "dashed"
color = "#9C27B0"
fontcolor = "#7B1FA2"
margin = 14
concrete [label = "Concrete Syntax", fillcolor = "#F3E5F5", color = "#9C27B0"]
parser [label = "Parser", fillcolor = "#F3E5F5", color = "#9C27B0"]
abstract [label = "Abstract Syntax", fillcolor = "#F3E5F5", color = "#9C27B0"]
elaborator [label = "Elaborator", fillcolor = "#F3E5F5", color = "#9C27B0"]
typed [label = "Elaborated Syntax\n(with types)", fillcolor = "#F3E5F5", color = "#9C27B0"]
lowering [label = "Lowering", fillcolor = "#F3E5F5", color = "#9C27B0"]
ir [label = "Intermediate Representation\n(backend-neutral contract)", fillcolor = "#F3E5F5", color = "#9C27B0"]
}
subgraph cluster_tools {
label = "Developer Tooling"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 14
lsp [label = "LSP\n(parsing + elaboration diagnostics)", fillcolor = "#E8F5E9", color = "#4CAF50"]
}
}
subgraph cluster_execution {
label = "Execution Layer"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 18
query_translation [label = "IR to Backend Translation", fillcolor = "#E8F5E9", color = "#4CAF50"]
}
subgraph cluster_backends {
label = "Backend Implementations"
style = "rounded"
color = "#666666"
fontcolor = "#333333"
bgcolor = "#FAFAFA"
margin = 20
subgraph cluster_inmemory {
label = "In-Memory Backend"
style = "dashed"
color = "#FF9800"
fontcolor = "#F57C00"
margin = 14
inmem_rel [label = "Relation Translation", fillcolor = "#FFF3E0", color = "#FF9800"]
inmem_laws [label = "Law / Chase Translation", fillcolor = "#FFF3E0", color = "#FF9800"]
inmem_qe [label = "In-Memory Query Engine", fillcolor = "#FFF3E0", color = "#FF9800"]
inmem_db [label = "In-Memory DB", fillcolor = "#ECEFF1", color = "#607D8B"]
}
subgraph cluster_postgres {
label = "Postgres Backend"
style = "dashed"
color = "#FF9800"
fontcolor = "#F57C00"
margin = 14
pg_schema [label = "IR to Schema", fillcolor = "#FFF3E0", color = "#FF9800"]
pg_queries [label = "IR to SQL Queries", fillcolor = "#FFF3E0", color = "#FF9800"]
pg_qe [label = "Postgres Query Engine", fillcolor = "#FFF3E0", color = "#FF9800"]
pg_db [label = "Postgres DB", fillcolor = "#ECEFF1", color = "#607D8B"]
}
subgraph cluster_rust {
label = "Rust Backend"
style = "dashed"
color = "#FF9800"
fontcolor = "#F57C00"
margin = 14
ffi [label = "FFI Haskell to Rust\n(IR marshalling)", fillcolor = "#FFF3E0", color = "#FF9800"]
rust_rel [label = "Relation Translation", fillcolor = "#FFF3E0", color = "#FF9800"]
rust_laws [label = "Law / Chase Translation", fillcolor = "#FFF3E0", color = "#FF9800"]
rust_qe [label = "Rust Query Engine", fillcolor = "#FFF3E0", color = "#FF9800"]
rust_store [label = "Rust Storage / Query Layer", fillcolor = "#ECEFF1", color = "#607D8B"]
}
}
concrete -> parser [color = "#9C27B0"]
parser -> abstract [color = "#9C27B0"]
abstract -> elaborator [color = "#9C27B0"]
elaborator -> typed [color = "#9C27B0"]
typed -> lowering [color = "#9C27B0"]
lowering -> ir [color = "#9C27B0"]
parser -> lsp [style = "dashed", color = "#4CAF50", label = "syntax info"]
elaborator -> lsp [style = "dashed", color = "#4CAF50", label = "type info"]
ir -> query_translation [color = "#4CAF50", label = "compile IR"]
query_translation -> inmem_rel [color = "#FF9800"]
query_translation -> inmem_laws [color = "#FF9800"]
query_translation -> pg_schema [color = "#FF9800"]
query_translation -> pg_queries [color = "#FF9800"]
query_translation -> ffi [color = "#FF9800"]
inmem_rel -> inmem_qe [color = "#FF9800"]
inmem_laws -> inmem_qe [color = "#FF9800"]
inmem_qe -> inmem_db [color = "#607D8B"]
pg_schema -> pg_db [color = "#607D8B"]
pg_queries -> pg_qe [color = "#FF9800"]
pg_qe -> pg_db [color = "#607D8B"]
ffi -> rust_rel [color = "#FF9800"]
ffi -> rust_laws [color = "#FF9800"]
rust_rel -> rust_qe [color = "#FF9800"]
rust_laws -> rust_qe [color = "#FF9800"]
rust_qe -> rust_store [color = "#607D8B"]
}