diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 0000000..d793038 --- /dev/null +++ b/.editorconfig @@ -0,0 +1,24 @@ +# https://EditorConfig.org + +root = true + +[*] +charset = utf-8 +end_of_line = lf +indent_style = space +indent_size = 4 +insert_final_newline = true +trim_trailing_whitespace = true + +[*.rs] +max_line_length = 100 + +[*.md] +max_line_length = 150 +trim_trailing_whitespace = false + +[*.sh] +indent_size = 2 + +[*.{yml,yaml,json}] +indent_size = 2 diff --git a/notes/004-key-concepts.md b/notes/004-key-concepts.md index 5d4fab0..6bb525b 100644 --- a/notes/004-key-concepts.md +++ b/notes/004-key-concepts.md @@ -2,6 +2,13 @@ This document explains the main ideas in Geolog without assuming math background. +## Related Notes + +If you want to compare Geolog with Datalog, or think about automatic translation, see: + +- `notes/007-geolog-and-datalog.md` — a simple explanation of what maps well and what does not. +- `notes/008-geolog-to-datalog-subset.md` — a practical subset of Geolog that could be translated automatically into plain Datalog. + --- ## Theories and Instances diff --git a/notes/008-geolog-to-datalog-subset.md b/notes/008-geolog-to-datalog-subset.md new file mode 100644 index 0000000..2965580 --- /dev/null +++ b/notes/008-geolog-to-datalog-subset.md @@ -0,0 +1,454 @@ +# Geolog To Datalog Subset + +*A practical subset of Geolog that can be translated automatically into plain Datalog.* + +--- + +## Goal + +The goal of this note is to describe a part of Geolog that can be translated into plain Datalog without too much trouble. + +This note is about a **safe, useful subset**. + +It is **not** about translating all of Geolog. + +That matters because full Geolog has features that go beyond what plain Datalog can express. + +--- + +## Short Answer + +Automatic translation is quite reasonable if we only allow: + +- sorts, +- relations, +- functions, +- instances with facts, +- and simple Horn-style axioms. + +Automatic translation becomes much harder if we also allow: + +- existential conclusions, +- equality-generating behavior, +- or any rule that needs the system to create fresh objects. + +--- + +## What Subset We Want + +We want a subset where Geolog rules behave like ordinary Datalog rules. + +That means we allow the parts of Geolog that fit this pattern: + +- some facts are already true, +- a rule sees those facts, +- the rule derives one new fact, +- repeat until nothing new can be derived. + +--- + +## Allowed Features + +### 1. Sort declarations + +Allowed: + +```geolog +V : Sort; +E : Sort; +``` + +These are Geolog types. + +In Datalog, they can be handled in either of two ways: + +- keep them only as compiler information, +- or emit unary predicates such as `V(x)` and `E(x)`. + +--- + +### 2. Relation declarations + +Allowed: + +```geolog +edge : [from: V, to: V] -> Prop; +reachable : [from: V, to: V] -> Prop; +``` + +Translation idea: + +```prolog +edge(X, Y) +reachable(X, Y) +``` + +The field names such as `from` and `to` are useful in Geolog, but plain Datalog usually uses only argument position. + +--- + +### 3. Function declarations + +Allowed: + +```geolog +src : E -> V; +tgt : E -> V; +``` + +Translation idea: + +```prolog +src(E, V) +tgt(E, V) +``` + +This means we turn Geolog functions into ordinary Datalog relations. + +That is a practical encoding, but it is important to remember that a Datalog relation does not automatically behave like a true function. + +--- + +### 4. Instance facts + +Allowed element declarations: + +```geolog +a, b, c : V; +e1, e2 : E; +``` + +Possible Datalog output: + +```prolog +V(a). +V(b). +V(c). +E(e1). +E(e2). +``` + +Allowed relation facts: + +```geolog +[from: a, to: b] edge; +[from: b, to: c] edge; +``` + +Translation: + +```prolog +edge(a, b). +edge(b, c). +``` + +Allowed function facts: + +```geolog +e1 src = a; +e1 tgt = b; +``` + +Translation: + +```prolog +src(e1, a). +tgt(e1, b). +``` + +--- + +### 5. Simple axioms + +Allowed form: + +```geolog +name : forall vars. + premise1, premise2, ..., premisen |- conclusion; +``` + +Where: + +- all variables are universal variables, +- each premise is a simple atomic fact, +- the conclusion is exactly one simple atomic fact, +- there is no `exists` in the conclusion, +- there is no disjunction, +- there is no equality-generating step. + +This is the part of Geolog that lines up best with plain Datalog. + +--- + +## What Is Not Allowed + +This subset should reject programs that use: + +- existential conclusions, +- equality-generating rules, +- rules that require creation of fresh objects, +- disjunction, +- nested instances in places where they affect rule translation, +- or advanced features that do not flatten cleanly into ordinary Datalog facts and rules. + +--- + +## Translation Rules + +This section gives the basic mechanical mapping. + +### Sort declaration + +Geolog: + +```geolog +V : Sort; +``` + +Possible Datalog representation: + +```prolog +V(x) +``` + +In practice, you usually emit only the facts like `V(a).`, not a schema declaration. + +--- + +### Relation declaration + +Geolog: + +```geolog +reachable : [from: V, to: V] -> Prop; +``` + +Datalog predicate shape: + +```prolog +reachable(X, Y) +``` + +--- + +### Function declaration + +Geolog: + +```geolog +src : E -> V; +``` + +Datalog predicate shape: + +```prolog +src(E, V) +``` + +This is called **relationalizing** the function: we represent the function as a relation. + +--- + +### Element declaration + +Geolog: + +```geolog +a : V; +``` + +Datalog: + +```prolog +V(a). +``` + +--- + +### Relation fact + +Geolog: + +```geolog +[from: a, to: b] edge; +``` + +Datalog: + +```prolog +edge(a, b). +``` + +--- + +### Function fact + +Geolog: + +```geolog +e1 src = a; +``` + +Datalog: + +```prolog +src(e1, a). +``` + +--- + +### Axiom + +Geolog: + +```geolog +ax/base : forall x,y : V. + [from: x, to: y] edge |- [from: x, to: y] reachable; +``` + +Datalog: + +```prolog +reachable(X, Y) :- edge(X, Y). +``` + +Geolog: + +```geolog +ax/trans : forall x,y,z : V. + [from: x, to: y] reachable, [from: y, to: z] reachable + |- [from: x, to: z] reachable; +``` + +Datalog: + +```prolog +reachable(X, Z) :- reachable(X, Y), reachable(Y, Z). +``` + +--- + +## Full Example + +### Geolog input + +```geolog +theory Graph { + V : Sort; + E : Sort; + + src : E -> V; + tgt : E -> V; + + edge : [from: V, to: V] -> Prop; + reachable : [from: V, to: V] -> Prop; + + ax/base : forall x,y : V. + [from: x, to: y] edge |- [from: x, to: y] reachable; + + ax/trans : forall x,y,z : V. + [from: x, to: y] reachable, [from: y, to: z] reachable + |- [from: x, to: z] reachable; +} + +instance G : Graph = { + a, b, c : V; + e1, e2 : E; + + e1 src = a; + e1 tgt = b; + e2 src = b; + e2 tgt = c; + + [from: a, to: b] edge; + [from: b, to: c] edge; +} +``` + +### Datalog output + +```prolog +V(a). +V(b). +V(c). +E(e1). +E(e2). + +src(e1, a). +tgt(e1, b). +src(e2, b). +tgt(e2, c). + +edge(a, b). +edge(b, c). + +reachable(X, Y) :- edge(X, Y). +reachable(X, Z) :- reachable(X, Y), reachable(Y, Z). +``` + +--- + +## What This Translation Loses + +Even in this safe subset, some Geolog meaning may be weakened. + +### Function meaning is weaker + +In Geolog, `src : E -> V` really means “each edge has one source”. + +In plain Datalog, once we translate it to a relation, this bad situation is possible unless something else prevents it: + +```prolog +src(e1, a). +src(e1, b). +``` + +So the translator is practical, but not fully faithful to all of Geolog’s function meaning. + +### Type checking may become weaker + +If the translator does not emit unary sort predicates like `V(a).`, then the output Datalog program may lose explicit type information. + +### Richer Geolog behavior is excluded + +This subset does not include the most powerful parts of Geolog. + +That is intentional. + +--- + +## Best Implementation Strategy + +If someone wanted to build this translator, the safest pipeline would be: + +1. parse the Geolog source, +2. elaborate or type-check it, +3. reject anything outside the supported subset, +4. turn functions into relations, +5. flatten named relation fields into positional arguments, +6. print Datalog facts and rules. + +That is much safer than trying to translate directly from raw surface syntax. + +--- + +## A Good Simple Definition Of The Subset + +Here is a useful one-sentence definition: + +> The translatable subset is the part of Geolog where theories can be expressed as universally quantified Horn-style rules over relations and function-as-relation facts, with no existential conclusions and no equality-generating behavior. + +--- + +## Final Summary + +If you only want a practical translator, the right plan is: + +- support sorts, +- support relations, +- support functions by turning them into relations, +- support simple axioms with one conclusion fact, +- reject existential and equality-heavy features. + +That gives a subset that is useful, understandable, and realistic to translate automatically. +