From 50ae1473e43fc0b41955003c05ffb26fa6bdccdb Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Mon, 23 Mar 2026 13:53:18 +0100 Subject: [PATCH] Add a comparison notes between Geolog and Datalog --- notes/007-geolog-and-datalog.md | 297 +++++++++++++++++++------------- 1 file changed, 179 insertions(+), 118 deletions(-) diff --git a/notes/007-geolog-and-datalog.md b/notes/007-geolog-and-datalog.md index 712748e..9f50a9c 100644 --- a/notes/007-geolog-and-datalog.md +++ b/notes/007-geolog-and-datalog.md @@ -1,53 +1,95 @@ # Geolog And Datalog -*How to map Geolog concepts onto Datalog, and where the analogy breaks down.* +*A simple explanation of how Geolog is similar to Datalog, and how it is different.* --- -## Summary +## What This Note Is About -Geolog is not just Datalog with different syntax, but a large and useful fragment of Geolog can be read that way. +Sometimes Geolog looks a lot like Datalog. -The best short summary is: +That is not an accident. -> **Geolog is like typed Datalog with native functions, existential rule heads, and stronger logical structure.** +Both systems let you: -The mapping works well for: +- describe facts, +- describe rules, +- and repeatedly apply those rules until no new facts appear. -- relations, -- Horn-style axioms, -- instance facts, -- and fixpoint/chase-style derivation. +But Geolog is not just Datalog with different syntax. -The mapping breaks down when Geolog uses: +Geolog also has features that go beyond basic Datalog, especially: -- existential conclusions, -- equality reasoning, -- and native function symbols with total/single-valued semantics. +- built-in types, called **sorts**, +- built-in **functions**, +- rules that can say “something exists”, +- and stronger support for equality. + +This note explains the overlap in simple terms. + +--- + +## First: What Is Datalog? + +Datalog is a rule language. + +It is often written with facts like this: + +```prolog +edge(a, b). +edge(b, c). +``` + +And rules like this: + +```prolog +reachable(X, Y) :- edge(X, Y). +reachable(X, Z) :- reachable(X, Y), reachable(Y, Z). +``` + +You can read those rules as: + +- if `edge(X, Y)` is true, then `reachable(X, Y)` is true, +- if `reachable(X, Y)` and `reachable(Y, Z)` are true, then `reachable(X, Z)` is true. + +Then the system keeps applying rules until it cannot derive anything new. + +--- + +## Very Short Summary + +The best simple summary is: + +> Geolog contains a part that behaves a lot like typed Datalog, but full Geolog can do more. + +Another good summary is: + +> If you ignore some of the more advanced features, you can often read Geolog as “Datalog with types and functions”. --- ## Side-By-Side Mapping -| Concept | Geolog | Datalog | -|---------|--------|---------| -| Domain/type | `V : Sort;` | usually implicit, or unary predicate like `V(x)` | +| Idea | In Geolog | In Datalog | +|------|-----------|------------| +| Type of thing | `V : Sort;` | often left implicit, or written as a one-argument fact like `V(x)` | | Relation | `edge : [from: V, to: V] -> Prop;` | `edge(X, Y)` | -| Function | `src : E -> V;` | usually encoded as relation `src(E, V)` | +| Function | `src : E -> V;` | usually encoded as a relation like `src(E, V)` | | Fact | `[from: a, to: b] edge;` | `edge(a, b).` | -| Function fact | `e1 src = a;` | `src(e1, a).` | -| Axiom/rule | `A, B |- C` | `C :- A, B.` | -| Derived predicate | relation + axioms | IDB predicate + rules | -| Instance | concrete elements/facts | EDB facts | -| Chase/fixpoint | repeated rule firing to saturation | bottom-up least fixpoint | -| Existential conclusion | `|- exists y : B. ...` | not plain Datalog | -| Equality reasoning | native/logical | usually extension or encoding | +| Function value | `e1 src = a;` | `src(e1, a).` | +| Rule | `A, B |- C` | `C :- A, B.` | +| Starting data | an instance | a set of starting facts | +| Repeated rule application | chase | repeated rule evaluation until no new facts appear | --- -## 1. Sorts And Types +## 1. Sorts In Geolog Are Like Types -Geolog has first-class sorts: +Geolog has **sorts**. + +A sort is just a kind of thing. + +Example: ```geolog theory Graph { @@ -55,7 +97,11 @@ theory Graph { } ``` -Plain Datalog often treats types as implicit, or simulates them with unary predicates: +This says that `V` is a sort. You can think of it as the type “vertex”. + +Basic Datalog usually does not have this built in. + +If someone wants types in Datalog, they often simulate them with ordinary facts such as: ```prolog V(a). @@ -63,13 +109,16 @@ V(b). V(c). ``` -So the closest Datalog mental model is: +So the rough translation is: -- **Geolog sort** ≈ **type discipline or unary predicate**. +- **Geolog sort** = **type**, +- and in Datalog that type is often represented using an ordinary relation. --- -## 2. Relations Map Directly +## 2. Relations Match Very Well + +This part is simple. Geolog relation declaration: @@ -77,25 +126,30 @@ Geolog relation declaration: edge : [from: V, to: V] -> Prop; ``` -Fact in a Geolog instance: +Geolog fact: ```geolog [from: a, to: b] edge; ``` -Datalog equivalent: +Datalog version: ```prolog edge(a, b). ``` -This is the cleanest and most direct part of the mapping. +So: + +- a Geolog relation is very much like a Datalog predicate, +- and a Geolog relation fact is very much like a Datalog fact. --- -## 3. Functions Become Functional Relations +## 3. Functions Need More Care -Geolog has native function symbols: +Geolog has real functions. + +Example: ```geolog E : Sort; @@ -107,47 +161,46 @@ a : V; e1 src = a; ``` -In Datalog, the closest encoding is relational: +This means: + +- `src` takes an edge and gives back one vertex, +- `e1` has source `a`. + +In Datalog, people usually write the same information like this: ```prolog src(e1, a). ``` -But there is an important semantic gap: +But there is a difference. -- in **Geolog**, `src` is a genuine function, -- in **Datalog**, `src(e1, a)` is just a relation unless extra constraints are added. +In Geolog, `src` is a true function, so each edge should have one source. -So the right translation rule is: +In plain Datalog, `src(e1, a)` is just a normal fact in a normal relation. Datalog does not automatically know that `src` is supposed to behave like a function. -- **Geolog function** ≈ **relation that is intended to be total and single-valued**. +So the simple rule is: -To fully mimic a function, you also need functional constraints such as: - -```text -if src(e, v1) and src(e, v2), then v1 = v2 -``` - -Plain Datalog does not natively enforce that on its own. +- **Geolog function** becomes **a relation in Datalog**, +- but some meaning is lost unless you also add extra rules or constraints. --- -## 4. Axioms Map To Rules +## 4. Axioms In Geolog Look Like Rules In Datalog -Geolog Horn-style axiom: +Geolog rule: ```geolog ax/base : forall x,y : V. [from: x, to: y] edge |- [from: x, to: y] reachable; ``` -Datalog: +Datalog rule: ```prolog reachable(X, Y) :- edge(X, Y). ``` -Another Geolog axiom: +Geolog rule: ```geolog ax/trans : forall x,y,z : V. @@ -155,19 +208,21 @@ ax/trans : forall x,y,z : V. |- [from: x, to: z] reachable; ``` -Datalog: +Datalog rule: ```prolog reachable(X, Z) :- reachable(X, Y), reachable(Y, Z). ``` -This is the main sense in which Geolog resembles Datalog. +So in the simplest cases: + +- **Geolog axiom** behaves like a **Datalog rule**. --- -## 5. Full Example +## 5. A Full Example -Geolog: +Here is a Geolog theory and instance: ```geolog theory Graph { @@ -190,7 +245,7 @@ instance G : Graph = { } ``` -Datalog: +Here is the matching Datalog program: ```prolog edge(a, b). @@ -200,7 +255,7 @@ reachable(X, Y) :- edge(X, Y). reachable(X, Z) :- reachable(X, Y), reachable(Y, Z). ``` -After saturation/fixpoint, both derive: +After rule application, both systems derive: ```prolog reachable(a, b). @@ -208,103 +263,109 @@ reachable(b, c). reachable(a, c). ``` ---- - -## 6. Instances Are Like EDB Facts - -Geolog theory + instance splits naturally into: - -- **theory** = vocabulary + laws -- **instance** = starting facts - -That matches the Datalog split between: - -- **rules/schema**, and -- **EDB facts**. - -So: - -- **Geolog instance** ≈ **extensional database**. +This is the part of Geolog that feels the most like Datalog. --- -## 7. Chase Is Like Bottom-Up Evaluation +## 6. An Instance Is Just The Starting Data -Geolog chase: +In Geolog, a theory gives: -- repeatedly apply axioms, -- add newly implied facts, -- stop when nothing new appears. +- the names of sorts, +- the names of functions, +- the names of relations, +- and the axioms. -Datalog bottom-up evaluation: +Then an instance gives: -- repeatedly apply rules, -- add newly derived facts, -- stop at the least fixpoint. +- the actual elements, +- function values, +- and relation facts. -So in the Horn fragment: +So a Geolog instance is just the starting data for rule application. -- **Geolog chase** ≈ **Datalog least-fixpoint evaluation**. +That is exactly the role that a set of starting facts plays in Datalog. --- -## 8. Where The Mapping Breaks: Existentials +## 7. Chase In Geolog Is Like Repeated Rule Evaluation In Datalog -Geolog can say things like: +In Geolog, the **chase** means: + +1. look at the current facts, +2. find an axiom that applies, +3. add the fact that the axiom says must follow, +4. repeat until nothing new appears. + +In Datalog, the standard evaluation idea is very similar: + +1. look at the current facts, +2. find a rule whose body is true, +3. add the fact in the head of the rule, +4. repeat until nothing new appears. + +So for simple rule systems: + +- **Geolog chase** and **Datalog rule evaluation** are very close. + +--- + +## 8. Where The Analogy Stops Working: “There Exists ...” + +Geolog can write rules like this: ```geolog forall p : Person. |- exists c : Person. [parent: p, kid: c] child; ``` -Meaning: +In plain language, this says: -- for every person `p`, some child must exist, -- and chase may create a fresh witness if needed. +- every person has some child. -Plain Datalog cannot do this, because plain Datalog does not invent fresh elements. +If the system does not already know such a child, Geolog may create a fresh witness during chase. -This is not standard Datalog anymore. It is closer to: +Basic Datalog does not do that. -- existential rules, -- tuple-generating dependencies, -- or Datalog±. +Basic Datalog works with facts and constants that already exist. It does not normally invent brand new objects just because a rule says something must exist. -So: - -- **Geolog with existential conclusions** is strictly more expressive than plain Datalog. +So this is one major way in which Geolog is more expressive than ordinary Datalog. --- -## 9. Where The Mapping Breaks: Equality +## 9. Where The Analogy Stops Working: Equality -Geolog also supports equality-oriented reasoning in ways that go beyond plain Datalog. +Geolog can also reason about equality in richer ways than plain Datalog normally does. -If equalities are inferred or propagated as part of semantic saturation, then the system is doing more than ordinary Datalog rule evaluation. +That means Geolog can sometimes merge or identify things based on logical consequences, while ordinary Datalog is usually just adding new facts. -So: - -- **Geolog equality reasoning** is not captured by plain Datalog without extensions or encodings. +So equality is another place where Geolog is doing more than basic Datalog. --- -## 10. Best Practical Translation Rules +## 10. A Safe Mental Model -When reading Geolog in Datalog terms, use this mental model: +If you want a simple way to think about the connection, use this: -- `Sort` → type or unary predicate -- function `f : A -> B` → relation `f(a, b)` plus intended functionality -- relation `R : [...] -> Prop` → predicate `R(...)` -- axiom `premises |- conclusion` → rule `conclusion :- premises` -- instance facts → EDB facts -- chase → bottom-up fixpoint -- existential conclusion → no longer plain Datalog +- a **sort** in Geolog is like a type, +- a **relation** in Geolog is like a predicate in Datalog, +- a **fact** in a Geolog instance is like a Datalog fact, +- an **axiom** in Geolog is often like a Datalog rule, +- and **chase** is like repeatedly applying rules until no new facts appear. + +That mental model is very useful. + +Just remember that full Geolog is richer than plain Datalog. --- -## 11. Bottom Line +## Final Summary -The safest final statement is: +Here is the simplest correct conclusion: -> **A Datalog-like fragment of Geolog maps very naturally to typed Datalog, but full Geolog is richer because it includes native functions, existential rule heads, and stronger logical machinery.** +> A large part of Geolog can be understood as Datalog-like rule evaluation over typed data. + +But also: + +> Full Geolog goes beyond basic Datalog because it has real functions, richer equality reasoning, and rules that can say that new things must exist.