From 15821e53ce1298bb307413b7c2772caeef36840a Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Thu, 19 Mar 2026 14:40:44 +0100 Subject: [PATCH] Expand notes with examples and more explanations --- notes/001-architecture.md | 171 ++++++++++++------- notes/002-design-decisions.md | 217 ++++++++++++++++++++---- notes/003-project-status.md | 170 +++++++++++++------ notes/004-key-concepts.md | 283 +++++++++++++++++++++++++------ notes/005-critical-assessment.md | 186 +++++++++++++++----- 5 files changed, 789 insertions(+), 238 deletions(-) diff --git a/notes/001-architecture.md b/notes/001-architecture.md index 6c137bb..e9499af 100644 --- a/notes/001-architecture.md +++ b/notes/001-architecture.md @@ -1,53 +1,71 @@ # Architecture +## What Geolog Is + +Geolog is a command-line tool for defining rules and automatically deriving facts from them. Think of it like a spreadsheet that can figure out new cells based on formulas you define, except the "formulas" are logical rules and the "cells" are facts about things. + +**Example:** You tell Geolog "if A is friends with B, and B is friends with C, then A knows C." Then you say "Alice is friends with Bob" and "Bob is friends with Charlie." Geolog automatically figures out "Alice knows Charlie." + +--- + ## Tech Stack **Language:** Rust (2021 edition) ### Key Dependencies -| Crate | Purpose | -|---------------------|-------------------------------------| -| `chumsky` | Parser combinator library | -| `ariadne` | Error reporting with source spans | -| `rkyv` | Zero-copy serialization | -| `rustyline` | REPL readline interface | -| `egglog-union-find` | Union-find for congruence closure | -| `roaring` | Bitmap library for sparse relations | -| `indexmap` | Order-preserving hash maps | -| `memmap2` | Memory-mapped file I/O | +| Crate | What It Does | +|---------------------|-----------------------------------------------------------------| +| `chumsky` | Reads `.geolog` files and understands their structure | +| `ariadne` | Shows helpful error messages with line numbers | +| `rkyv` | Saves data to disk efficiently | +| `rustyline` | Makes the interactive prompt work (arrow keys, history, etc.) | +| `egglog-union-find` | Tracks when two things are actually the same thing | +| `roaring` | Stores large sets of numbers efficiently | +| `indexmap` | Hash maps that remember insertion order | +| `memmap2` | Reads files by mapping them directly into memory | -### Testing +### Testing Tools -- `insta` — snapshot testing -- `proptest` — property-based testing -- `tempfile` — temporary directory management +| Tool | Purpose | +|------|---------| +| `insta` | Compares output against saved "snapshots" to catch regressions | +| `proptest` | Generates random inputs to find edge cases | +| `tempfile` | Creates temporary directories for tests | --- -## System Layers +## How Data Flows Through the System ``` ┌─────────────────────────────────────────────────────┐ │ USER INTERFACE │ -│ REPL (interactive CLI) | Batch file loading │ +│ You type commands or load .geolog files here │ ├─────────────────────────────────────────────────────┤ -│ PARSING LAYER (Lexer → Parser → AST) │ +│ PARSING LAYER │ +│ Turns text into a structured tree (like HTML DOM) │ +│ Example: "P : Sort;" becomes {name: "P", kind: Sort}│ ├─────────────────────────────────────────────────────┤ -│ ELABORATION LAYER (AST → Core IR) │ -│ Type checking, name resolution, theory/instance │ +│ ELABORATION LAYER │ +│ Checks types and resolves names │ +│ Example: Verifies "src : E -> V" refers to real │ +│ sorts E and V that were declared │ ├─────────────────────────────────────────────────────┤ -│ CORE LAYER (Typed Representation) │ -│ Signature, Term, Formula, Structure │ +│ CORE LAYER │ +│ The verified, type-checked representation │ +│ Everything here is guaranteed to be well-formed │ ├─────────────────────────────────────────────────────┤ -│ STORAGE LAYER (Persistence) │ -│ Append-only store with version control │ +│ STORAGE LAYER │ +│ Saves everything to disk, tracks history │ +│ Like git for your data │ ├─────────────────────────────────────────────────────┤ -│ QUERY & SOLVER LAYER (Execution) │ -│ Chase algorithm, congruence closure, model search │ +│ QUERY & SOLVER LAYER │ +│ Runs the chase algorithm to derive new facts │ +│ Searches for models that satisfy your rules │ ├─────────────────────────────────────────────────────┤ -│ TENSOR ALGEBRA (Axiom Checking) │ -│ Sparse tensor evaluation for axiom validation │ +│ TENSOR ALGEBRA │ +│ Efficiently checks which rules need to fire │ +│ Uses sparse matrices for performance │ └─────────────────────────────────────────────────────┘ ``` @@ -55,50 +73,79 @@ ## Directory Structure -| Path | Purpose | -|---------------------|----------------------------------------------| -| `src/bin/geolog.rs` | CLI entry point | -| `src/lib.rs` | Library root | -| `src/repl.rs` | Interactive REPL state machine | -| `src/lexer.rs` | Tokenization | -| `src/parser.rs` | Token stream → AST | -| `src/ast.rs` | Abstract syntax tree types | -| `src/core.rs` | Core IR: Signature, Term, Formula, Structure | -| `src/elaborate/` | AST → Core elaboration | -| `src/store/` | Persistence layer | -| `src/query/` | Chase algorithm, relational algebra | -| `src/solver/` | SMT-style model enumeration | -| `src/tensor/` | Sparse tensor algebra | -| `src/cc.rs` | Congruence closure (union-find) | -| `examples/geolog/` | 30+ example `.geolog` files | -| `tests/` | Test files | -| `docs/` | ARCHITECTURE.md, SYNTAX.md | -| `proofs/` | Lean4 formalization | +| Path | What's There | +|---------------------|--------------| +| `src/bin/geolog.rs` | The main program entry point — run `cargo run` and this executes | +| `src/lib.rs` | Library code that other Rust programs could use | +| `src/repl.rs` | The interactive prompt (handles `:chase`, `:inspect`, etc.) | +| `src/lexer.rs` | Breaks text into tokens like `theory`, `{`, `Sort`, `;` | +| `src/parser.rs` | Builds a tree structure from tokens | +| `src/ast.rs` | Defines what the tree structure looks like | +| `src/core.rs` | The "verified" internal representation | +| `src/elaborate/` | Type checking and name resolution | +| `src/store/` | Saving/loading data, version control | +| `src/query/` | The chase algorithm lives here | +| `src/solver/` | Model search (finding valid instances) | +| `src/tensor/` | Efficient rule checking with sparse matrices | +| `src/cc.rs` | Congruence closure — tracks equalities | +| `examples/geolog/` | 30+ example files you can learn from | +| `tests/` | Automated tests | +| `docs/` | Documentation (ARCHITECTURE.md, SYNTAX.md) | --- ## Key Entry Points -| Entry Point | Location | -|-------------|----------| -| CLI | `src/bin/geolog.rs` | -| Parse | `src/lib.rs::parse()` | -| REPL | `src/repl.rs::ReplState::process_line()` | -| Theory elaboration | `elaborate/theory.rs::elaborate_theory()` | -| Instance elaboration | `elaborate/instance.rs::elaborate_instance_ctx()` | -| Chase | `query/chase.rs::chase_fixpoint_with_cc()` | -| Model enumeration | `solver/mod.rs::enumerate_models()` | +If you're reading the code, start here: + +| What You Want | Where To Look | +|---------------|---------------| +| How the CLI starts | `src/bin/geolog.rs` | +| How files are parsed | `src/lib.rs::parse()` | +| How REPL commands work | `src/repl.rs::ReplState::process_line()` | +| How theories are checked | `src/elaborate/theory.rs::elaborate_theory()` | +| How instances are built | `src/elaborate/instance.rs::elaborate_instance_ctx()` | +| How facts are derived | `src/query/chase.rs::chase_fixpoint_with_cc()` | +| How models are searched | `src/solver/mod.rs::enumerate_models()` | --- ## REPL Commands +When you run `geolog` interactively, these commands are available: + ``` -:list, :inspect - Introspection -:add, :assert, :retract - Mutations -:query, :explain, :compile - Query analysis -:chase, :solve, :extend - Inference -:commit, :history - Version control -:source - Load programs -:help - Show help +:list - Show all theories and instances +:inspect Triangle - Show details of "Triangle" instance + +:add Triangle v : V; - Add element v to Triangle +:assert [x:a, y:b] R; - Assert a relation holds +:retract [x:a, y:b] R; - Remove a relation + +:chase Triangle - Derive all consequences from axioms +:solve MyInstance - Search for a valid model + +:commit "message" - Save current state (like git commit) +:history - Show past commits + +:source file.geolog - Load and run a file +:help - Show help +``` + +### Example Session + +``` +$ cargo run -- examples/geolog/graph.geolog +> :list +Theories: Graph +Instances: Triangle + +> :inspect Triangle +Instance Triangle : Graph + Carriers: + V: {A, B, C} + E: {ab, bc, ca} + Functions: + src: {ab -> A, bc -> B, ca -> C} + tgt: {ab -> B, bc -> C, ca -> A} ``` diff --git a/notes/002-design-decisions.md b/notes/002-design-decisions.md index 58ce331..1e30b88 100644 --- a/notes/002-design-decisions.md +++ b/notes/002-design-decisions.md @@ -1,66 +1,215 @@ # Design Decisions -## Geometric Logic Foundation - -- **Axioms as Sequents:** `forall vars. premises |- conclusion` -- **Positive Conclusions:** Existentials and disjunctions allowed, but never negation -- **Geometric Morphisms:** Preserved by design for category-theoretic semantics +This document explains *why* Geolog works the way it does. --- -## Identity System +## Geometric Logic: What It Means -- **Luid** ("Local Universe ID"): Globally unique across all structures -- **Slid** ("Structure-Local ID"): Index within a single structure -- Bidirectional mapping enables persistent identity despite structure changes +Geolog uses "geometric logic" — a restricted form of logic that can express: +- "If X then Y" rules +- "There exists something such that..." +- "A or B" (but you can't say "not A") + +**What you CAN write:** +``` +// If x and y are both connected to z, they know each other +forall x, y, z. connected(x,z), connected(y,z) |- knows(x,y) + +// Every person has a best friend +forall p : Person. |- exists f : Person. best_friend(p, f) + +// Every task is either done or pending +forall t : Task. |- done(t) \/ pending(t) +``` + +**What you CANNOT write:** +``` +// No negation allowed +forall x. |- not blocked(x) // INVALID + +// No "if not" patterns +forall x. not done(x) |- pending(x) // INVALID +``` + +**Why this restriction?** Geometric logic has nice mathematical properties — it's preserved by "geometric morphisms" between topoi. This isn't just theory; it means the reasoning is compositional and predictable. --- -## Storage +## Postfix Function Application -- **Append-Only:** Elements only tombstoned, never deleted -- **Patch-based Versioning:** Each commit is a delta from parent -- **GeologMeta:** Single homoiconic theory instance storing all data +Most languages write `f(x)` to apply function `f` to `x`. Geolog writes `x f`. + +``` +// Traditional notation +src(ab) // "apply src to ab" + +// Geolog notation +ab src // "ab, then src" +``` + +**Why?** It matches how you read pipelines left-to-right: + +``` +// Geolog: read left to right +ab src tgt // "take ab, get its src, get that's tgt" + +// Traditional: read inside out +tgt(src(ab)) // "tgt of src of ab" +``` + +This is called "categorical" style because it matches how mathematicians compose morphisms in category theory. + +--- + +## Identity System: Luids and Slids + +Geolog needs to track elements across different operations. It uses two kinds of IDs: + +| ID Type | Scope | Example | +|---------|-------|---------| +| **Luid** (Local Universe ID) | Global — unique across everything | "Element #4827" | +| **Slid** (Structure-Local ID) | Local — index within one structure | "The 3rd vertex" | + +**Why both?** +- Slids are small integers (0, 1, 2...) — fast for computation +- Luids persist across saves/loads — you can reference "that specific vertex" forever + +``` +Structure "Triangle": + Vertex A has Slid=0, Luid=1001 + Vertex B has Slid=1, Luid=1002 + Vertex C has Slid=2, Luid=1003 + +After merging A and B (because an axiom said A = B): + Vertex A has Slid=0, Luid=1001 (canonical) + Vertex B → now points to A +``` + +--- + +## Append-Only Storage + +Geolog never deletes data. When you "remove" something, it's marked as removed but still stored. + +**Why?** +- Perfect audit trail — you can see exactly what changed and when +- Safe undo — nothing is ever lost +- Git-like history — you can go back to any previous state + +``` +Commit 1: Added vertices A, B, C +Commit 2: Added edge ab +Commit 3: Marked edge ab as "removed" ← still stored, just flagged +``` --- ## Type System -- **Postfix Application:** `x f` not `f(x)` — categorical style -- **Derived Sorts:** Products of base sorts for record domains -- **Product Domains:** Functions can take record arguments: `[x: M, y: M] -> M` -- **Relations → Prop:** Relations are functions to `Prop` +### Sorts (Types) + +A **sort** is a type of thing. Vertices and Edges are different sorts: + +```geolog +theory Graph { + V : Sort; // V is a type (the type of vertices) + E : Sort; // E is a type (the type of edges) +} +``` + +### Functions + +Functions map elements to elements: + +```geolog +src : E -> V; // Every edge has a source vertex +tgt : E -> V; // Every edge has a target vertex +mul : [x: M, y: M] -> M; // Binary function taking a pair +``` + +The `[x: M, y: M]` is a **product domain** — the function takes a record with two fields. + +### Relations + +Relations are yes/no questions about tuples: + +```geolog +leq : [x: X, y: X] -> Prop; // "Is x ≤ y?" +connected : [a: V, b: V] -> Prop; +``` + +`Prop` means "proposition" — it's either true or false for each input. --- -## Chase Algorithm +## Chase Algorithm Design -- **Fixpoint Iteration:** Derives consequences until closure -- **Congruence Closure:** Merges elements when axioms conclude `x = y` -- **Tensor Algebra:** Efficient axiom checking via sparse tensors -- **Termination:** Unit laws no longer cause infinite loops +The chase repeatedly applies rules until nothing new can be derived: + +``` +Initial facts: friends(alice, bob), friends(bob, charlie) +Rule: friends(x,y), friends(y,z) |- knows(x,z) + +Iteration 1: + - Rule matches with x=alice, y=bob, z=charlie + - Conclusion knows(alice, charlie) is missing + - Add: knows(alice, charlie) + +Iteration 2: + - No rule has missing conclusions + - Stop (fixpoint reached) + +Final facts: friends(alice, bob), friends(bob, charlie), knows(alice, charlie) +``` + +**Key features:** +- Uses **tensor algebra** to efficiently find which rules need to fire +- Integrates **congruence closure** so `x = y` conclusions actually merge elements +- **Terminates** for theories with identity elements (earlier versions could loop forever) --- ## Solver Architecture -- **Explicit Search Tree:** Not implicit in call stack -- **Refinement Preorder:** Structures can grow (carriers, functions, relations) -- **Obligations vs Unsat:** Axiom obligation = need to witness conclusion (not failure) -- **Tactics-based:** AutoTactic composes CheckTactic, ForwardChainingTactic, PropagateEquationsTactic +When you need to *find* a model (not just derive facts), the solver searches: + +``` +Goal: Find an instance of Graph with a cycle + +Search tree: + Node 1: Empty graph + → Obligation: need at least one edge (from some axiom) + → Branch: try adding edge e1 + + Node 2: Graph with e1 + → Obligation: e1 needs src and tgt + → Branch: try src(e1) = v1, tgt(e1) = v1 (self-loop!) + + Node 3: Graph with e1, v1, self-loop + → All axioms satisfied + → SUCCESS: return this model +``` + +**Key features:** +- **Explicit search tree** — the solver builds a tree structure, not recursive function calls +- **Tactics** — different strategies (forward chaining, equation propagation) can be combined +- **Obligations** — "this axiom requires a witness" is tracked explicitly --- ## Query Compilation -- **QueryOp Intermediate:** SQL-like operators (Scan, Filter, Join, Project) -- **Optimization Passes:** Filter fusion, projection pushdown -- **Store-aware:** Compiled directly to GeologMeta queries with indexing +Queries are compiled to relational algebra (like SQL internally): ---- +``` +Query: Find all x,y where connected(x,y) and connected(y,x) -## Tensor Algebra +Compiled plan: + 1. Scan relation "connected" → get all (x,y) pairs + 2. Scan relation "connected" → get all (a,b) pairs + 3. Join where y=a and x=b + 4. Project out x,y +``` -- **Sparse Representation:** Roaring Bitmaps for efficient membership -- **Lazy Expression Trees:** Tensor products fused with contractions -- **Boolean Semiring:** AND for product, OR for sum +This is optimized with standard database techniques (filter pushdown, etc.). diff --git a/notes/003-project-status.md b/notes/003-project-status.md index 6050f9c..e7bc9d8 100644 --- a/notes/003-project-status.md +++ b/notes/003-project-status.md @@ -1,82 +1,152 @@ # Project Status -**Version:** 0.1.0 +**Version:** 0.1.0 (Early stage, but functional) --- -## Metrics +## Codebase Size -| Metric | Count | -|--------|-------| -| Rust source code | 30,284 lines | -| Test code | 7,872 lines | -| Example `.geolog` files | 2,346 lines | -| Example files | 30 | -| Rust source files | 84 | +| What | Count | +|------|-------| +| Rust source code | ~30,000 lines | +| Test code | ~8,000 lines | +| Example `.geolog` files | ~2,300 lines across 30 files | +| Source files | 84 | | Unit tests | 137 (all passing) | | Integration tests | 31 (all passing) | --- -## What's Complete +## What Works Well -- Parser, lexer, syntax support -- Type system and elaboration -- Chase algorithm with equality saturation -- Tensor algebra for axiom checking -- SMT-style model enumeration -- Persistence and version control -- Interactive REPL -- Comprehensive test suite +These features are complete and reliable: + +### Core Engine +- **Parser** — reads `.geolog` files, gives helpful error messages +- **Type checker** — catches mistakes like using undefined sorts +- **Chase algorithm** — derives facts from rules correctly +- **Equality handling** — when rules say `x = y`, elements are properly merged + +### Examples That Run +```bash +cargo run -- examples/geolog/graph.geolog # Simple directed graph +cargo run -- examples/geolog/preorder.geolog # Reflexive, transitive relation +cargo run -- examples/geolog/transitive_closure.geolog # Shows chase in action +cargo run -- examples/geolog/category.geolog # Category theory basics +``` + +### Interactive Features +- REPL with command history +- Inspect any theory or instance +- Run chase interactively +- Save/load with version history --- -## Mostly Complete +## What Mostly Works -- Parameterized theories -- Nested instances (parsing done, elaboration partial) -- Query optimization -- Relational algebra compilation -- Congruence closure integration +These features work but have limitations: + +### Parameterized Theories +You can define theories that depend on other instances: + +```geolog +theory (N : PetriNet instance) Marking { + token : Sort; + token/of : token -> N/P; // References N's places +} +``` + +**Limitation:** Nested instances (instances inside instances) only partially work. + +### Model Search +The solver can find models satisfying axioms, but: +- Works best for small models +- Some search strategies are incomplete +- No timeout handling --- ## Known Gaps -### TODOs in Code +### Things That Don't Work Yet -| Location | Description | -|----------|-------------| -| `src/repl.rs:84` | Store full theory_type in GeologMeta | -| `src/overlay.rs:380` | Copy functions (deferred) | -| `src/store/query.rs:61` | Columnar batch loading | -| `src/store/instance.rs` | External references, product domains | -| `src/query/to_relalg.rs` | Column path navigation, ScanRelationOp | +| Feature | Status | +|---------|--------| +| Product types in conclusions | Functions returning records don't fully work | +| Existentials over products | `exists p : [x:M, y:M]. ...` fails | +| Field projections in equations | `x.field = y` doesn't compile | -### Incomplete Features +### Example of What Fails -- Product types in conclusions don't fully work -- Existentials over product sorts unsupported -- Field projections in equations don't compile -- Disjunction fires both branches (correct but inefficient) -- Tensor compilation failures are silent +```geolog +// This works: +forall x : M. |- exists y : M. f(x) = y -### In Progress +// This doesn't work (product sort in existential): +forall x : M. |- exists p : [a:M, b:M]. g(x) = p +``` -- Lean4 formalization of monotonic submodel proofs -- Homoiconic query plan representation -- Disjunction variable alignment for tensor builder +### Silent Failures + +When the tensor compiler can't handle an axiom, it **silently skips it**. This can be confusing — your axiom looks fine but nothing happens. + +```geolog +// This axiom might be silently skipped if it has unsupported patterns: +ax/complex : forall x. premise |- [a: x, b: x] some_relation; +// No error, but also no effect +``` + +--- + +## TODOs in the Code + +These are marked in the source with `// TODO`: + +| Location | What's Missing | +|----------|----------------| +| `src/repl.rs:84` | Theory type not fully stored in metadata | +| `src/overlay.rs:380` | Copying functions between structures | +| `src/store/query.rs:61` | Batch loading for better performance | +| `src/store/instance.rs` | External references between instances | +| `src/query/to_relalg.rs` | Some relational operators | --- ## Test Coverage -| File | Coverage Area | -|------|---------------| -| `tests/unit_elaborate.rs` | Type checking (14 tests) | -| `tests/unit_chase.rs` | Chase algorithm | -| `tests/unit_parsing.rs` | Parser (4 tests) | -| `tests/examples_integration.rs` | 31 integration tests | -| `tests/proptest_*.rs` | 11 property-based test suites | +### What's Tested -Fuzzing targets exist in `fuzz/` directory. +| Area | Tests | +|------|-------| +| Type checking | 14 unit tests | +| Chase algorithm | Comprehensive unit tests | +| Parser | 4 unit tests + snapshot tests | +| Examples | 31 integration tests (one per example file) | +| Random inputs | 11 property-based test suites | + +### Running Tests + +```bash +cargo test # Run all tests +cargo test unit_chase # Run only chase tests +cargo test --release # Run with optimizations (faster) +``` + +### Fuzzing + +The `fuzz/` directory has fuzzing targets that generate random inputs to find crashes: + +```bash +cargo +nightly fuzz run fuzz_parser +``` + +--- + +## What's Being Worked On + +According to code comments and structure: + +1. **Lean4 formalization** — Mathematical proofs about correctness (in `proofs/`) +2. **Better disjunction handling** — Currently fires both branches, should choose one +3. **Query plan improvements** — More efficient execution diff --git a/notes/004-key-concepts.md b/notes/004-key-concepts.md index b9814e9..5d4fab0 100644 --- a/notes/004-key-concepts.md +++ b/notes/004-key-concepts.md @@ -1,88 +1,273 @@ # Key Concepts -## Axioms +This document explains the main ideas in Geolog without assuming math background. -An axiom is an "if-then" rule: +--- -> "If premises are true, then conclusion must be true." +## Theories and Instances + +A **theory** is a template — it defines what *kinds* of things exist and what rules they follow. + +An **instance** is a concrete example — actual things that follow the theory's rules. + +### Example: Graphs + +```geolog +// The theory: "A graph has vertices and edges" +theory Graph { + V : Sort; // There's a kind of thing called "vertex" + E : Sort; // There's a kind of thing called "edge" + src : E -> V; // Every edge has a source vertex + tgt : E -> V; // Every edge has a target vertex +} + +// An instance: A specific triangle graph +instance Triangle : Graph = { + // Three vertices + A : V; + B : V; + C : V; + + // Three edges forming a triangle + ab : E; + ab src = A; // Edge ab goes from A... + ab tgt = B; // ...to B + + bc : E; + bc src = B; + bc tgt = C; + + ca : E; + ca src = C; + ca tgt = A; +} +``` + +Think of it like: +- **Theory** = "A spreadsheet template with certain columns" +- **Instance** = "A filled-in spreadsheet" + +--- + +## Axioms (Rules) + +An **axiom** tells Geolog: "Whenever the left side is true, make the right side true too." ``` forall x, y, z. friends(x,y), friends(y,z) |- knows(x,z) + ↑ ↑ ↑ + "for any x,y,z" "if these are true" "then this must be true" ``` -Similar to Datalog rules, but extended with: -- Existentials in conclusions: `|- exists x. R(x)` -- Disjunctions: `|- A \/ B` -- Equality conclusions: `|- x = y` +### Simple Example: Transitivity -Datalog is limited to Horn clauses (single atom in head). Geolog supports geometric sequents. +"If A ≤ B and B ≤ C, then A ≤ C" + +```geolog +theory Preorder { + X : Sort; + leq : [x: X, y: X] -> Prop; // "x ≤ y" relation + + // Transitivity rule + ax/trans : forall x: X, y: X, z: X. + [x: x, y: y] leq, [x: y, y: z] leq |- [x: x, y: z] leq; +} +``` + +### What `[x: x, y: y] leq` Means + +This is Geolog's way of writing `leq(x, y)` or "x ≤ y": + +``` +[x: x, y: y] leq + ↑ ↑ + a record applied to the "leq" relation + with fields + x and y +``` + +It's like calling a function with named parameters: `leq(x=x, y=y)`. --- -## Chase Algorithm +## The Chase Algorithm -Repeatedly applies axioms until no new facts can be derived (fixpoint). +The **chase** is what makes Geolog useful. It automatically applies your rules to derive new facts. -### Building Blocks +### Step-by-Step Example -1. **Axioms** — rules to apply -2. **Structure** — carriers (element sets), functions, relations -3. **Bindings** — variable-to-element mappings -4. **Tensor system** — finds violations (premises true, conclusion false) -5. **Fire conclusion** — adds relations, defines functions, creates elements -6. **Congruence closure** — tracks element equality via union-find +**Setup:** +```geolog +theory Friends { + Person : Sort; + friends : [a: Person, b: Person] -> Prop; + knows : [a: Person, b: Person] -> Prop; -### Loop + // Rule: friends-of-friends know each other + ax/fof : forall x, y, z : Person. + [a: x, b: y] friends, [a: y, b: z] friends |- [a: x, b: z] knows; +} + +instance Group : Friends = { + alice : Person; + bob : Person; + charlie : Person; + + [a: alice, b: bob] friends; // Alice and Bob are friends + [a: bob, b: charlie] friends; // Bob and Charlie are friends +} +``` + +**Running the chase:** ``` -repeat until no changes: - for each axiom: - find violations (tensor system) - fire conclusion for each violation - propagate equations (congruence closure) +Initial state: + friends = {(alice, bob), (bob, charlie)} + knows = {} + +Chase iteration 1: + Looking at axiom ax/fof... + Found match: x=alice, y=bob, z=charlie + - [a:alice, b:bob] friends? YES + - [a:bob, b:charlie] friends? YES + - [a:alice, b:charlie] knows? NO ← violation! + Fire conclusion: add [a:alice, b:charlie] knows + + knows = {(alice, charlie)} ← NEW FACT + +Chase iteration 2: + Looking at axiom ax/fof... + No new violations found. + DONE (fixpoint reached) + +Final state: + friends = {(alice, bob), (bob, charlie)} + knows = {(alice, charlie)} ``` +### The Key Insight + +You don't write a program to derive facts. You write *rules*, and Geolog figures out everything that follows from them. + --- -## Union-Find +## Union-Find (Equality Tracking) -Data structure for tracking equivalence classes. Inputs are elements (IDs), not sets. +When a rule concludes that two things are equal (`x = y`), Geolog needs to merge them. It uses a data structure called **union-find**. -``` -union(3, 7) // "3 and 7 are now equivalent" -union(7, 12) // "7 and 12 are now equivalent" -find(3) // Returns same representative as find(12) +### The Problem + +```geolog +// Rule: If two people have the same ID, they're the same person +ax/same_id : forall p, q : Person, i : ID. + [p: p] id = i, [p: q] id = i |- p = q; ``` -### Why Not Dictionary + Sets? +If this rule fires with `p = alice` and `q = alicia`, Geolog learns that `alice = alicia`. But now every reference to `alicia` should really mean `alice`. -| Approach | Union Cost | -|----------|------------| -| Dict + Sets | O(n) — must update all elements in merged set | -| Union-Find | O(1) amortized — just one pointer update | +### How Union-Find Works -Used for congruence closure when axioms conclude `x = y`. +``` +Before: + alice → alice (points to itself) + alicia → alicia (points to itself) + +After union(alice, alicia): + alice → alice (the "representative") + alicia → alice (now points to alice) + +Later, find(alicia) returns alice +``` + +**Why not just use a dictionary?** + +| Approach | Cost of merging N elements | +|----------|---------------------------| +| Dictionary + Sets | O(N) — must update every element | +| Union-Find | O(1) — just change one pointer | + +When rules derive thousands of equalities, this matters. --- -## Language Syntax +## Existentials (Creating New Things) -Not based on any single existing language. Influences: +Rules can say "there must exist something": -| Feature | Source | -|---------|--------| -| Postfix application (`x f` not `f(x)`) | Category theory | -| Sequent notation (`premises \|- conclusion`) | Proof theory | -| Path separators (`/` not `.`) | Unique to Geolog | -| Theory/Instance structure | Algebraic specification languages | +```geolog +// Every person has a best friend +ax/has_bf : forall p : Person. |- exists f : Person. [p: p, f: f] best_friend; +``` + +When the chase processes this: + +1. **Check:** Does person `p` already have a best friend? +2. **If yes:** Do nothing +3. **If no:** Create a new person and make them `p`'s best friend + +``` +Before chase: + Persons = {alice, bob} + best_friend = {} + +After chase: + Persons = {alice, bob, fresh_1, fresh_2} + best_friend = {(alice, fresh_1), (bob, fresh_2)} +``` + +The chase created `fresh_1` and `fresh_2` because the axiom demanded witnesses. --- -## Geometric Logic Restrictions +## Disjunctions (Or) -By design, Geolog excludes: -- Negation (no `not R(x)`) -- Classical implication (no `if A then B`) -- Arithmetic (no `x + 1 = y`) +Rules can have "or" in conclusions: -These are intentional limits for topos-theoretic semantics, not missing features. +```geolog +// Every task is either done or pending +ax/status : forall t : Task. |- [t: t] done \/ [t: t] pending; +``` + +**Current behavior:** Geolog fires *both* branches. So every task becomes both done AND pending. + +**Why?** Proper disjunction handling requires backtracking search, which is complex. The current approach is "sound" (never wrong) but "incomplete" (may add more facts than necessary). + +--- + +## Tensor Algebra (How Chase Is Fast) + +Checking "which rules have violations" naively requires checking every possible variable assignment. For 100 people and 3 variables, that's 100³ = 1,000,000 checks. + +Geolog uses **sparse tensors** (think: sparse matrices, but multi-dimensional) to do this efficiently: + +``` +Relations are stored as sparse tensors: + friends[i][j] = 1 if person i and person j are friends + +Finding violations becomes tensor operations: + violations = friends ⊗ friends ⊗ (1 - knows) + ↑ ↑ ↑ + "x,y friends" "y,z friends" "x,z not known" +``` + +This is much faster than nested loops because: +- Sparse storage skips empty cells +- Tensor operations are highly optimized +- Results are computed in bulk, not one-by-one + +--- + +## What Geolog Can't Express + +Geometric logic intentionally excludes some things: + +| Can't Write | Why Not | +|-------------|---------| +| `not R(x)` | No negation — can't say "x is NOT red" | +| `if not A then B` | No negation in premises | +| `x + 1 = y` | No arithmetic — this isn't a calculator | +| `forall x. R(x)` with no conclusion | Axioms must have conclusions | + +**This is by design.** These restrictions make the logic "geometric," which has nice theoretical properties (preserved by geometric morphisms, has models in any topos, etc.). + +For practical purposes: Geolog is good for "if-then" rules and "there exists" statements, not for arithmetic or negation. diff --git a/notes/005-critical-assessment.md b/notes/005-critical-assessment.md index a918c88..47b5a21 100644 --- a/notes/005-critical-assessment.md +++ b/notes/005-critical-assessment.md @@ -1,79 +1,179 @@ # Critical Assessment +This document takes a skeptical look at what Geolog actually is versus what the documentation suggests. + +--- + ## Origin -AI-generated project by Claude Opus 4.5. From README: +Geolog was generated by an AI (Claude Opus 4.5). From the README: > "This README was synthesized automatically by Claude Opus 4.5. As was this entire project, really." ---- +This is unusual — most software is written by humans over time. Geolog was generated in a relatively short period, which has implications: -## Claimed Use Cases +**Positives:** +- Consistent code style throughout +- Good test coverage from the start +- Clean architecture -- Business process workflow orchestration -- Formal verification -- Database query design -- Petri net reachability +**Concerns:** +- No organic evolution based on real user feedback +- No battle-testing in production environments +- Design decisions may be theoretically elegant but practically awkward --- -## Reality Check +## Claimed vs. Actual Use Cases -| Claim | Assessment | -|-------|------------| -| **Workflow orchestration** | No APIs, no integrations, no distributed execution. Just a REPL. Real workflow engines (Temporal, Camunda) handle failure recovery, monitoring, external systems. | -| **Formal verification** | Lean4 proofs "in progress." Not comparable to Coq, Lean, or TLA+ which have decades of development and industrial use. | -| **Database query design** | Not a database. No persistence at scale, no SQL, no transactions, no concurrency. | -| **Petri net reachability** | Toy examples with 3-4 places. Not industrial-scale process modeling. | +### Claim: "Business Process Workflow Orchestration" + +**What this suggests:** You could use Geolog to manage business workflows — approve orders, route documents, handle exceptions. + +**Reality:** Geolog is a command-line REPL. It has: +- No REST API +- No integration with external systems +- No notification system +- No user interface +- No distributed execution +- No failure recovery + +**What real workflow tools have:** Temporal, Camunda, and Airflow handle retries, timeouts, external service calls, monitoring dashboards, and run across multiple servers. + +**Verdict:** Geolog can *model* a workflow on paper. It cannot *run* a workflow in production. --- -## What It Actually Is +### Claim: "Formal Verification" -A **research/educational tool** for: -- Learning geometric logic -- Understanding chase algorithms -- Experimenting with category-theoretic ideas -- Playing with Petri net examples -- Prototyping constraint systems +**What this suggests:** You could use Geolog to prove your software is correct. + +**Reality:** +- The Lean4 proofs in `proofs/` are "in progress" +- There's no connection between Geolog models and actual code +- No proof certificates are generated + +**What real verification tools have:** Coq, Lean, and TLA+ have decades of development, extensive libraries, extraction to executable code, and industrial use (CompCert, seL4, Amazon's use of TLA+). + +**Verdict:** Geolog has some formal foundations but is not a verification tool. --- -## What It's Not +### Claim: "Database Query Design" -- Production infrastructure -- Real workflow orchestration platform -- Industrial formal verification tool -- Database replacement -- Anything at scale +**What this suggests:** You could use Geolog for database work. + +**Reality:** +- Data lives in memory (with optional file persistence) +- No SQL interface +- No transactions +- No concurrent access +- No indexing for large datasets +- No connection to actual databases + +**What real database tools have:** PostgreSQL, SQLite, and even embedded databases handle concurrent writes, ACID transactions, and gigabytes of data. + +**Verdict:** Geolog has relational algebra internally for a different purpose. It's not a database. + +--- + +### Claim: "Petri Net Reachability" + +**What this suggests:** You could analyze Petri nets with Geolog. + +**Reality:** This is the most honest claim. The examples show Petri net modeling: + +```geolog +theory PetriNet { + P : Sort; // Places + T : Sort; // Transitions + pre : [t: T, p: P] -> Prop; // Input arcs + post : [t: T, p: P] -> Prop; // Output arcs +} +``` + +**But:** The examples have 3-4 places. Industrial Petri net tools handle thousands of places with specialized algorithms (state space reduction, symbolic model checking). + +**Verdict:** Geolog can model small Petri nets. It's educational, not industrial. + +--- + +## What Geolog Actually Is + +### Good For + +| Use | Why | +|-----|-----| +| **Learning geometric logic** | Clean implementation with good examples | +| **Understanding chase algorithms** | The code is readable and well-commented | +| **Experimenting with rule systems** | Quick to try ideas in the REPL | +| **Teaching category theory concepts** | Postfix notation matches categorical thinking | +| **Prototyping constraint systems** | Faster than writing from scratch | + +### Not Good For + +| Use | Why Not | +|-----|---------| +| **Production systems** | No APIs, no monitoring, no error handling | +| **Large datasets** | Everything in memory, no optimization for scale | +| **Real workflows** | No external integrations | +| **Serious verification** | Incomplete proofs, no connection to real code | +| **Anything user-facing** | It's a developer REPL, not an application | --- ## Technical Influences -| Concept | Source | -|---------|--------| -| Geometric logic | Mathematical logic | -| Chase algorithm | Database theory | -| Tensor algebra | Linear algebra / sparse computation | -| Equality saturation | Program optimization (egg/egglog) | -| Congruence closure | Automated theorem proving | +Geolog combines ideas from several fields, which is interesting but also means it's not the best tool for any single field: -Uses `egglog-union-find` crate from the egg/egglog project. +| Concept | From | Better Specialized Tool | +|---------|------|------------------------| +| Geometric logic | Mathematical logic | Lean, Coq (for proofs) | +| Chase algorithm | Database theory | Datalog engines (for scale) | +| Equality saturation | Program optimization | egg/egglog (more mature) | +| Tensor operations | Linear algebra | NumPy, sparse matrix libraries | +| Model search | SMT solving | Z3, CVC5 (much more powerful) | --- -## Related Tools +## Code Quality Assessment -Similar tools in adjacent spaces (not cited as direct inspirations): -- **Alloy** — relational logic model finder -- **Datalog** — logic programming for databases -- **CQL** — categorical query language -- **Z3** — SMT solver -- **Egg** — equality saturation toolkit +Despite the skepticism above, the code itself is well-done: + +**Positives:** +- Comprehensive tests (168 passing) +- Good error messages with source locations +- Clean module separation +- Documented design decisions + +**Negatives:** +- Some silent failures (tensor compilation) +- Incomplete features (product types) +- No performance benchmarks +- Limited real-world testing + +--- + +## Who Should Use This? + +**Yes:** +- Students learning about logic and type systems +- Researchers exploring geometric logic +- Developers curious about chase algorithms +- Anyone wanting to understand equality saturation + +**No:** +- Teams building production systems +- Anyone needing a database +- Anyone needing workflow automation +- Anyone needing formal guarantees --- ## Bottom Line -Well-crafted educational project, not production infrastructure. The code quality is high and tests pass, but there's no evidence of real-world usage or battle-testing. +Geolog is a **well-crafted educational project**, not production infrastructure. + +The code is clean, tests pass, and the examples work. But the claimed "use cases" are aspirational — describing what geometric logic *could theoretically* be used for, not what this specific tool is ready to do. + +If you want to learn about geometric logic and chase algorithms, Geolog is excellent. If you want to solve real problems, use established tools in each domain.