From 73394d7e9e95e585f3119395eba61807a3d88947 Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Wed, 11 Mar 2026 13:08:26 +0100 Subject: [PATCH] Add summary of disscusion about DB baclend --- NOTES.md | 172 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 172 insertions(+) create mode 100644 NOTES.md diff --git a/NOTES.md b/NOTES.md new file mode 100644 index 0000000..c8eab29 --- /dev/null +++ b/NOTES.md @@ -0,0 +1,172 @@ +# Notes + +This file records working notes from the recent Geolog / backend design discussion. + +## Current State of `chase-rs` + +- `chase-rs` currently runs the minimal `.chase` frontend language, not the + richer `.geolog` example language. +- The current CLI supports `repl`, `gui`, and `script` over the minimal command + language. +- The project is best described as a chase engine for TGDs / existential rules, + not a narrow classical Datalog implementation. +- It can execute Datalog-like programs, but it also supports existential head + variables via labeled null generation. + +## Geolog and `geolog-lite` + +- The `.geolog` files in `examples/geolog/` appear to define a richer DSL that + is not currently wired into the executable frontend. +- A practical direction is to extract a smaller, well-defined core named + `geolog-lite`. +- `geolog-lite` should focus on the positive relational fragment: + - theories + - instances + - predicates + - conjunctive rule bodies + - conjunctive rule heads + - existential variables in heads + - conjunctive queries +- Surface features such as record arguments, qualified names, field projection, + and function-like syntax should be desugared into a flat relational IR. + +## Using `chase-rs` as a Processor + +- `chase-rs` is a good fit as a backend processor for `geolog-lite` once the + language is lowered to flat predicates, facts, and TGD-style rules. +- A compilation pipeline could be: + - parse `geolog-lite` + - elaborate names and parameters + - lower to relational IR + - compile to `Instance` + `Rule` + - run chase + - answer conjunctive queries over the materialized instance +- This works well for the positive existential fragment. +- It does **not** fully cover richer Geolog features such as equality reasoning, + solver-oriented unsatisfiability, disjunction, or other advanced semantics. + +## Most Critical Missing Capability in `chase-rs` + +- The most semantically important missing feature is equality support: + - EGDs + - congruence closure / equality saturation +- A close second is full query-answering support beyond the current materialized + instance matching behavior. + +## Relational Database as Backend + +- A relational database can be used as a backend for `geolog-lite`. +- However, a plain relational database is **not** a drop-in replacement for a + chase engine. +- If the language includes existential rules and repeated rule application to + fixpoint, the chase logic still needs to exist somewhere. +- Therefore, the preferred model is: + - database = storage + joins + dedup + persistence + - Rust engine = chase coordination + witness generation + trigger tracking + +## Preferred Architecture: DB-Backed Chase Engine + +- Strongest direction discussed: + - store facts in a relational database + - let SQL perform joins and candidate generation + - implement the chase loop in Rust + - let Rust handle restricted-chase triggers and existential witnesses +- This keeps semantics explicit while benefiting from database execution. + +## Recommended MVP Scope + +Start with a deliberately small fragment: + +- positive rules only +- flat predicates only after lowering +- conjunctive bodies and heads +- existential variables in heads +- conjunctive queries +- no equality +- no negation +- no disjunction +- no solver-style unsat features + +This should be enough for a useful first vertical slice. + +## Suggested Data Model + +- Use one database table per predicate. +- Do not use SQL `NULL` as labeled nulls. +- Generate labeled null identities in Rust. +- Qualified Geolog names should lower to stable SQL-safe predicate names. + +Example lowering ideas: + +- `Edge : [src: V, tgt: V] -> Prop` -> predicate table `edge(src, tgt)` +- `src : E -> V` -> relation `src(e, v)` after desugaring +- `R/data : R -> [x: A, y: B]` -> relation `r_data(r, x, y)` + +## Restricted Chase in a DB-Backed Engine + +- The key mechanism is trigger tracking. +- For each rule application, compute a canonical frontier binding. +- Store applied triggers in a dedicated table. +- Skip any body match whose frontier binding has already been applied. +- For existential heads, generate fresh labeled nulls in Rust and insert the + corresponding derived facts. + +High-level loop: + +1. Find body matches using SQL. +2. Project frontier bindings. +3. Filter out already-applied triggers. +4. Generate existential witnesses in Rust when needed. +5. Insert derived facts with dedup. +6. Record applied triggers. +7. Repeat until fixpoint. + +## Clean Backend Interface + +- A clean backend trait is preferable to embedding raw SQL everywhere. +- The backend abstraction should be chase-shaped rather than a generic + `execute_sql` wrapper. + +Suggested responsibilities: + +- ensure predicate storage exists +- load / append base facts +- evaluate a compiled rule body +- filter unseen triggers +- insert derived facts with dedup +- record applied triggers +- evaluate conjunctive queries over materialized facts + +Also keep an in-memory backend as the semantic reference implementation. + +## DuckDB as the First Database Backend + +- DuckDB is a strong candidate for the first backend: + - embedded / in-process + - good analytical join performance + - simple deployment model + - good fit for batched chase rounds +- Recommended usage model: + - one engine process owns the database connection + - work in batches rather than many tiny transactions + - keep chase semantics in Rust +- This makes DuckDB a good first backend behind a clean database trait. + +## Suggested Implementation Order + +1. Define a relational IR for predicates, rules, and queries. +2. Define a clean backend trait for fact storage and rule evaluation. +3. Keep an in-memory backend as the reference implementation. +4. Implement a `DuckDbBackend`. +5. Build a minimal `geolog-lite` parser and lowering pipeline. +6. Run a first end-to-end example such as transitive closure. + +## Good First Example + +- `examples/geolog/transitive_closure.geolog` is an ideal first target. +- It exercises: + - theory parsing + - predicate lowering + - basic chase materialization + - recursive closure + - simple conjunctive querying