# Key Concepts This document explains the main ideas in Geolog without assuming math background. ## Related Notes If you want to compare Geolog with Datalog, or think about automatic translation, see: - `notes/007-geolog-and-datalog.md` — a simple explanation of what maps well and what does not. - `notes/008-geolog-to-datalog-subset.md` — a practical subset of Geolog that could be translated automatically into plain Datalog. --- ## Theories and Instances A **theory** is a template — it defines what *kinds* of things exist and what rules they follow. An **instance** is a concrete example — actual things that follow the theory's rules. ### Example: Graphs ```geolog // The theory: "A graph has vertices and edges" theory Graph { V : Sort; // There's a kind of thing called "vertex" E : Sort; // There's a kind of thing called "edge" src : E -> V; // Every edge has a source vertex tgt : E -> V; // Every edge has a target vertex } // An instance: A specific triangle graph instance Triangle : Graph = { // Three vertices A : V; B : V; C : V; // Three edges forming a triangle ab : E; ab src = A; // Edge ab goes from A... ab tgt = B; // ...to B bc : E; bc src = B; bc tgt = C; ca : E; ca src = C; ca tgt = A; } ``` Think of it like: - **Theory** = "A spreadsheet template with certain columns" - **Instance** = "A filled-in spreadsheet" --- ## Axioms (Rules) An **axiom** tells Geolog: "Whenever the left side is true, make the right side true too." ``` forall x, y, z. friends(x,y), friends(y,z) |- knows(x,z) ↑ ↑ ↑ "for any x,y,z" "if these are true" "then this must be true" ``` ### Simple Example: Transitivity "If A ≤ B and B ≤ C, then A ≤ C" ```geolog theory Preorder { X : Sort; leq : [x: X, y: X] -> Prop; // "x ≤ y" relation // Transitivity rule ax/trans : forall x: X, y: X, z: X. [x: x, y: y] leq, [x: y, y: z] leq |- [x: x, y: z] leq; } ``` ### What `[x: x, y: y] leq` Means This is Geolog's way of writing `leq(x, y)` or "x ≤ y": ``` [x: x, y: y] leq ↑ ↑ a record applied to the "leq" relation with fields x and y ``` It's like calling a function with named parameters: `leq(x=x, y=y)`. --- ## The Chase Algorithm The **chase** is what makes Geolog useful. It automatically applies your rules to derive new facts. ### Step-by-Step Example **Setup:** ```geolog theory Friends { Person : Sort; friends : [a: Person, b: Person] -> Prop; knows : [a: Person, b: Person] -> Prop; // Rule: friends-of-friends know each other ax/fof : forall x, y, z : Person. [a: x, b: y] friends, [a: y, b: z] friends |- [a: x, b: z] knows; } instance Group : Friends = { alice : Person; bob : Person; charlie : Person; [a: alice, b: bob] friends; // Alice and Bob are friends [a: bob, b: charlie] friends; // Bob and Charlie are friends } ``` **Running the chase:** ``` Initial state: friends = {(alice, bob), (bob, charlie)} knows = {} Chase iteration 1: Looking at axiom ax/fof... Found match: x=alice, y=bob, z=charlie - [a:alice, b:bob] friends? YES - [a:bob, b:charlie] friends? YES - [a:alice, b:charlie] knows? NO ← violation! Fire conclusion: add [a:alice, b:charlie] knows knows = {(alice, charlie)} ← NEW FACT Chase iteration 2: Looking at axiom ax/fof... No new violations found. DONE (fixpoint reached) Final state: friends = {(alice, bob), (bob, charlie)} knows = {(alice, charlie)} ``` ### The Key Insight You don't write a program to derive facts. You write *rules*, and Geolog figures out everything that follows from them. --- ## Union-Find (Equality Tracking) When a rule concludes that two things are equal (`x = y`), Geolog needs to merge them. It uses a data structure called **union-find**. ### The Problem ```geolog // Rule: If two people have the same ID, they're the same person ax/same_id : forall p, q : Person, i : ID. [p: p] id = i, [p: q] id = i |- p = q; ``` If this rule fires with `p = alice` and `q = alicia`, Geolog learns that `alice = alicia`. But now every reference to `alicia` should really mean `alice`. ### How Union-Find Works ``` Before: alice → alice (points to itself) alicia → alicia (points to itself) After union(alice, alicia): alice → alice (the "representative") alicia → alice (now points to alice) Later, find(alicia) returns alice ``` **Why not just use a dictionary?** | Approach | Cost of merging N elements | |----------|---------------------------| | Dictionary + Sets | O(N) — must update every element | | Union-Find | O(1) — just change one pointer | When rules derive thousands of equalities, this matters. --- ## Existentials (Creating New Things) Rules can say "there must exist something": ```geolog // Every person has a best friend ax/has_bf : forall p : Person. |- exists f : Person. [p: p, f: f] best_friend; ``` When the chase processes this: 1. **Check:** Does person `p` already have a best friend? 2. **If yes:** Do nothing 3. **If no:** Create a new person and make them `p`'s best friend ``` Before chase: Persons = {alice, bob} best_friend = {} After chase: Persons = {alice, bob, fresh_1, fresh_2} best_friend = {(alice, fresh_1), (bob, fresh_2)} ``` The chase created `fresh_1` and `fresh_2` because the axiom demanded witnesses. --- ## Disjunctions (Or) Rules can have "or" in conclusions: ```geolog // Every task is either done or pending ax/status : forall t : Task. |- [t: t] done \/ [t: t] pending; ``` **Current behavior:** Geolog fires *both* branches. So every task becomes both done AND pending. **Why?** Proper disjunction handling requires backtracking search, which is complex. The current approach is "sound" (never wrong) but "incomplete" (may add more facts than necessary). --- ## Tensor Algebra (How Chase Is Fast) Checking "which rules have violations" naively requires checking every possible variable assignment. For 100 people and 3 variables, that's 100³ = 1,000,000 checks. Geolog uses **sparse tensors** (think: sparse matrices, but multi-dimensional) to do this efficiently: ``` Relations are stored as sparse tensors: friends[i][j] = 1 if person i and person j are friends Finding violations becomes tensor operations: violations = friends ⊗ friends ⊗ (1 - knows) ↑ ↑ ↑ "x,y friends" "y,z friends" "x,z not known" ``` This is much faster than nested loops because: - Sparse storage skips empty cells - Tensor operations are highly optimized - Results are computed in bulk, not one-by-one --- ## What Geolog Can't Express Geometric logic intentionally excludes some things: | Can't Write | Why Not | |-------------|---------| | `not R(x)` | No negation — can't say "x is NOT red" | | `if not A then B` | No negation in premises | | `x + 1 = y` | No arithmetic — this isn't a calculator | | `forall x. R(x)` with no conclusion | Axioms must have conclusions | **This is by design.** These restrictions make the logic "geometric," which has nice theoretical properties (preserved by geometric morphisms, has models in any topos, etc.). For practical purposes: Geolog is good for "if-then" rules and "there exists" statements, not for arithmetic or negation.