geolog-zeta-fork/notes/008-geolog-to-datalog-subset.md
Hassan Abedi 86e7669e51 WIP
2026-03-24 09:47:25 +01:00

8.4 KiB
Raw Permalink Blame History

Geolog To Datalog Subset

A practical subset of Geolog that can be translated automatically into plain Datalog.


Goal

The goal of this note is to describe a part of Geolog that can be translated into plain Datalog without too much trouble.

This note is about a safe, useful subset.

It is not about translating all of Geolog.

That matters because full Geolog has features that go beyond what plain Datalog can express.


Short Answer

Automatic translation is quite reasonable if we only allow:

  • sorts,
  • relations,
  • functions,
  • instances with facts,
  • and simple Horn-style axioms.

Automatic translation becomes much harder if we also allow:

  • existential conclusions,
  • equality-generating behavior,
  • or any rule that needs the system to create fresh objects.

What Subset We Want

We want a subset where Geolog rules behave like ordinary Datalog rules.

That means we allow the parts of Geolog that fit this pattern:

  • some facts are already true,
  • a rule sees those facts,
  • the rule derives one new fact,
  • repeat until nothing new can be derived.

Allowed Features

1. Sort declarations

Allowed:

V : Sort;
E : Sort;

These are Geolog types.

In Datalog, they can be handled in either of two ways:

  • keep them only as compiler information,
  • or emit unary predicates such as V(x) and E(x).

2. Relation declarations

Allowed:

edge : [from: V, to: V] -> Prop;
reachable : [from: V, to: V] -> Prop;

Translation idea:

edge(X, Y)
reachable(X, Y)

The field names such as from and to are useful in Geolog, but plain Datalog usually uses only argument position.


3. Function declarations

Allowed:

src : E -> V;
tgt : E -> V;

Translation idea:

src(E, V)
tgt(E, V)

This means we turn Geolog functions into ordinary Datalog relations.

That is a practical encoding, but it is important to remember that a Datalog relation does not automatically behave like a true function.


4. Instance facts

Allowed element declarations:

a, b, c : V;
e1, e2 : E;

Possible Datalog output:

V(a).
V(b).
V(c).
E(e1).
E(e2).

Allowed relation facts:

[from: a, to: b] edge;
[from: b, to: c] edge;

Translation:

edge(a, b).
edge(b, c).

Allowed function facts:

e1 src = a;
e1 tgt = b;

Translation:

src(e1, a).
tgt(e1, b).

5. Simple axioms

Allowed form:

name : forall vars.
  premise1, premise2, ..., premisen |- conclusion;

Where:

  • all variables are universal variables,
  • each premise is a simple atomic fact,
  • the conclusion is exactly one simple atomic fact,
  • there is no exists in the conclusion,
  • there is no disjunction,
  • there is no equality-generating step.

This is the part of Geolog that lines up best with plain Datalog.


What Is Not Allowed

This subset should reject programs that use:

  • existential conclusions,
  • equality-generating rules,
  • rules that require creation of fresh objects,
  • disjunction,
  • nested instances in places where they affect rule translation,
  • or advanced features that do not flatten cleanly into ordinary Datalog facts and rules.

Translation Rules

This section gives the basic mechanical mapping.

Sort declaration

Geolog:

V : Sort;

Possible Datalog representation:

V(x)

In practice, you usually emit only the facts like V(a)., not a schema declaration.


Relation declaration

Geolog:

reachable : [from: V, to: V] -> Prop;

Datalog predicate shape:

reachable(X, Y)

Function declaration

Geolog:

src : E -> V;

Datalog predicate shape:

src(E, V)

This is called relationalizing the function: we represent the function as a relation.


Element declaration

Geolog:

a : V;

Datalog:

V(a).

Relation fact

Geolog:

[from: a, to: b] edge;

Datalog:

edge(a, b).

Function fact

Geolog:

e1 src = a;

Datalog:

src(e1, a).

Axiom

Geolog:

ax/base : forall x,y : V.
  [from: x, to: y] edge |- [from: x, to: y] reachable;

Datalog:

reachable(X, Y) :- edge(X, Y).

Geolog:

ax/trans : forall x,y,z : V.
  [from: x, to: y] reachable, [from: y, to: z] reachable
  |- [from: x, to: z] reachable;

Datalog:

reachable(X, Z) :- reachable(X, Y), reachable(Y, Z).

Full Example

Geolog input

theory Graph {
  V : Sort;
  E : Sort;

  src : E -> V;
  tgt : E -> V;

  edge : [from: V, to: V] -> Prop;
  reachable : [from: V, to: V] -> Prop;

  ax/base : forall x,y : V.
    [from: x, to: y] edge |- [from: x, to: y] reachable;

  ax/trans : forall x,y,z : V.
    [from: x, to: y] reachable, [from: y, to: z] reachable
    |- [from: x, to: z] reachable;
}

instance G : Graph = {
  a, b, c : V;
  e1, e2 : E;

  e1 src = a;
  e1 tgt = b;
  e2 src = b;
  e2 tgt = c;

  [from: a, to: b] edge;
  [from: b, to: c] edge;
}

Datalog output

V(a).
V(b).
V(c).
E(e1).
E(e2).

src(e1, a).
tgt(e1, b).
src(e2, b).
tgt(e2, c).

edge(a, b).
edge(b, c).

reachable(X, Y) :- edge(X, Y).
reachable(X, Z) :- reachable(X, Y), reachable(Y, Z).

What This Translation Loses

Even in this safe subset, some Geolog meaning may be weakened.

Function meaning is weaker

In Geolog, src : E -> V really means “each edge has one source”.

In plain Datalog, once we translate it to a relation, this bad situation is possible unless something else prevents it:

src(e1, a).
src(e1, b).

So the translator is practical, but not fully faithful to all of Geologs function meaning.

Type checking may become weaker

If the translator does not emit unary sort predicates like V(a)., then the output Datalog program may lose explicit type information.

Richer Geolog behavior is excluded

This subset does not include the most powerful parts of Geolog.

That is intentional.


Best Implementation Strategy

If someone wanted to build this translator, the safest pipeline would be:

  1. parse the Geolog source,
  2. elaborate or type-check it,
  3. reject anything outside the supported subset,
  4. turn functions into relations,
  5. flatten named relation fields into positional arguments,
  6. print Datalog facts and rules.

That is much safer than trying to translate directly from raw surface syntax.


A Good Simple Definition Of The Subset

Here is a useful one-sentence definition:

The translatable subset is the part of Geolog where theories can be expressed as universally quantified Horn-style rules over relations and function-as-relation facts, with no existential conclusions and no equality-generating behavior.


How Useful This Subset Would Be

This subset would be genuinely useful, but only for the right jobs.

It would be useful for:

  • understanding a Geolog theory in simpler rule form,
  • debugging the rule part of a theory,
  • exporting some Geolog programs to Datalog tools,
  • comparing Geolog evaluation with Datalog evaluation,
  • and teaching the parts of Geolog that are close to rule-based database logic.

It would be especially useful for theories that are mostly:

  • relation-based,
  • rule-based,
  • and not dependent on creating fresh objects during chase.

Good examples include:

  • graph reachability,
  • transitive closure,
  • propagation rules,
  • dependency rules,
  • and many database-style logic programs.

But this subset would not cover the full point of Geolog.

It becomes less useful when a theory depends on:

  • existential conclusions,
  • equality-heavy reasoning,
  • or the full meaning of native functions.

So the best way to think about it is this:

  • it is a very useful bridge,
  • a useful debugging view,
  • and a useful restricted compilation target,
  • but it is not a replacement for full Geolog.

If built well, the ideal behavior would be:

  • if a Geolog program fits inside the subset, export it cleanly,
  • if it does not fit, explain clearly why not.

That alone would make the subset valuable.


Final Summary

If you only want a practical translator, the right plan is:

  • support sorts,
  • support relations,
  • support functions by turning them into relations,
  • support simple axioms with one conclusion fact,
  • reject existential and equality-heavy features.

That gives a subset that is useful, understandable, and realistic to translate automatically.