geolog-zeta-fork/notes/004-key-concepts.md

7.2 KiB

Key Concepts

This document explains the main ideas in Geolog without assuming math background.

If you want to compare Geolog with Datalog, or think about automatic translation, see:

  • notes/007-geolog-and-datalog.md — a simple explanation of what maps well and what does not.
  • notes/008-geolog-to-datalog-subset.md — a practical subset of Geolog that could be translated automatically into plain Datalog.

Theories and Instances

A theory is a template — it defines what kinds of things exist and what rules they follow.

An instance is a concrete example — actual things that follow the theory's rules.

Example: Graphs

// The theory: "A graph has vertices and edges"
theory Graph {
  V : Sort;           // There's a kind of thing called "vertex"
  E : Sort;           // There's a kind of thing called "edge"
  src : E -> V;       // Every edge has a source vertex
  tgt : E -> V;       // Every edge has a target vertex
}

// An instance: A specific triangle graph
instance Triangle : Graph = {
  // Three vertices
  A : V;
  B : V;
  C : V;

  // Three edges forming a triangle
  ab : E;
  ab src = A;    // Edge ab goes from A...
  ab tgt = B;    // ...to B

  bc : E;
  bc src = B;
  bc tgt = C;

  ca : E;
  ca src = C;
  ca tgt = A;
}

Think of it like:

  • Theory = "A spreadsheet template with certain columns"
  • Instance = "A filled-in spreadsheet"

Axioms (Rules)

An axiom tells Geolog: "Whenever the left side is true, make the right side true too."

forall x, y, z. friends(x,y), friends(y,z) |- knows(x,z)
        ↑              ↑                         ↑
   "for any x,y,z"  "if these are true"    "then this must be true"

Simple Example: Transitivity

"If A ≤ B and B ≤ C, then A ≤ C"

theory Preorder {
  X : Sort;
  leq : [x: X, y: X] -> Prop;  // "x ≤ y" relation

  // Transitivity rule
  ax/trans : forall x: X, y: X, z: X.
    [x: x, y: y] leq, [x: y, y: z] leq |- [x: x, y: z] leq;
}

What [x: x, y: y] leq Means

This is Geolog's way of writing leq(x, y) or "x ≤ y":

[x: x, y: y] leq
    ↑           ↑
 a record     applied to the "leq" relation
 with fields
 x and y

It's like calling a function with named parameters: leq(x=x, y=y).


The Chase Algorithm

The chase is what makes Geolog useful. It automatically applies your rules to derive new facts.

Step-by-Step Example

Setup:

theory Friends {
  Person : Sort;
  friends : [a: Person, b: Person] -> Prop;
  knows : [a: Person, b: Person] -> Prop;

  // Rule: friends-of-friends know each other
  ax/fof : forall x, y, z : Person.
    [a: x, b: y] friends, [a: y, b: z] friends |- [a: x, b: z] knows;
}

instance Group : Friends = {
  alice : Person;
  bob : Person;
  charlie : Person;

  [a: alice, b: bob] friends;      // Alice and Bob are friends
  [a: bob, b: charlie] friends;    // Bob and Charlie are friends
}

Running the chase:

Initial state:
  friends = {(alice, bob), (bob, charlie)}
  knows = {}

Chase iteration 1:
  Looking at axiom ax/fof...
  Found match: x=alice, y=bob, z=charlie
    - [a:alice, b:bob] friends? YES
    - [a:bob, b:charlie] friends? YES
    - [a:alice, b:charlie] knows? NO ← violation!
  Fire conclusion: add [a:alice, b:charlie] knows

  knows = {(alice, charlie)}  ← NEW FACT

Chase iteration 2:
  Looking at axiom ax/fof...
  No new violations found.
  DONE (fixpoint reached)

Final state:
  friends = {(alice, bob), (bob, charlie)}
  knows = {(alice, charlie)}

The Key Insight

You don't write a program to derive facts. You write rules, and Geolog figures out everything that follows from them.


Union-Find (Equality Tracking)

When a rule concludes that two things are equal (x = y), Geolog needs to merge them. It uses a data structure called union-find.

The Problem

// Rule: If two people have the same ID, they're the same person
ax/same_id : forall p, q : Person, i : ID.
  [p: p] id = i, [p: q] id = i |- p = q;

If this rule fires with p = alice and q = alicia, Geolog learns that alice = alicia. But now every reference to alicia should really mean alice.

How Union-Find Works

Before:
  alice → alice (points to itself)
  alicia → alicia (points to itself)

After union(alice, alicia):
  alice → alice (the "representative")
  alicia → alice (now points to alice)

Later, find(alicia) returns alice

Why not just use a dictionary?

Approach Cost of merging N elements
Dictionary + Sets O(N) — must update every element
Union-Find O(1) — just change one pointer

When rules derive thousands of equalities, this matters.


Existentials (Creating New Things)

Rules can say "there must exist something":

// Every person has a best friend
ax/has_bf : forall p : Person. |- exists f : Person. [p: p, f: f] best_friend;

When the chase processes this:

  1. Check: Does person p already have a best friend?
  2. If yes: Do nothing
  3. If no: Create a new person and make them p's best friend
Before chase:
  Persons = {alice, bob}
  best_friend = {}

After chase:
  Persons = {alice, bob, fresh_1, fresh_2}
  best_friend = {(alice, fresh_1), (bob, fresh_2)}

The chase created fresh_1 and fresh_2 because the axiom demanded witnesses.


Disjunctions (Or)

Rules can have "or" in conclusions:

// Every task is either done or pending
ax/status : forall t : Task. |- [t: t] done \/ [t: t] pending;

Current behavior: Geolog fires both branches. So every task becomes both done AND pending.

Why? Proper disjunction handling requires backtracking search, which is complex. The current approach is "sound" (never wrong) but "incomplete" (may add more facts than necessary).


Tensor Algebra (How Chase Is Fast)

Checking "which rules have violations" naively requires checking every possible variable assignment. For 100 people and 3 variables, that's 100³ = 1,000,000 checks.

Geolog uses sparse tensors (think: sparse matrices, but multi-dimensional) to do this efficiently:

Relations are stored as sparse tensors:
  friends[i][j] = 1 if person i and person j are friends

Finding violations becomes tensor operations:
  violations = friends ⊗ friends ⊗ (1 - knows)
                  ↑          ↑           ↑
              "x,y friends" "y,z friends" "x,z not known"

This is much faster than nested loops because:

  • Sparse storage skips empty cells
  • Tensor operations are highly optimized
  • Results are computed in bulk, not one-by-one

What Geolog Can't Express

Geometric logic intentionally excludes some things:

Can't Write Why Not
not R(x) No negation — can't say "x is NOT red"
if not A then B No negation in premises
x + 1 = y No arithmetic — this isn't a calculator
forall x. R(x) with no conclusion Axioms must have conclusions

This is by design. These restrictions make the logic "geometric," which has nice theoretical properties (preserved by geometric morphisms, has models in any topos, etc.).

For practical purposes: Geolog is good for "if-then" rules and "there exists" statements, not for arithmetic or negation.