# Geolog Project Notes ## Overview **Geolog** is a **Geometric Logic REPL** — a type theory with semantics in topoi, designed for formal specifications using geometric logic. ### Core Capabilities - **Geometric logic programming** — encode mathematical structures, relationships, and constraints - **Database schema definition** — define sorts, functions, relations, and axioms - **Model/instance creation** — create concrete finite models satisfying theory axioms - **Automated inference** — chase algorithm for automatic fact derivation - **Version control** — git-like commits and tracking for instances - **Persistence** — append-only storage with optional disk persistence ### Use Cases - Business process workflow orchestration - Formal verification via diagrammatic rewriting - Database query design - Petri net reachability and process modeling --- ## Tech Stack **Primary Language**: Rust (2021 edition, Cargo-based) ### Key Dependencies | Crate | Version | Purpose | |-------|---------|---------| | `chumsky` | 0.9 | Parser combinator library | | `ariadne` | 0.4 | Error reporting with source spans | | `rkyv` | 0.7 | Zero-copy serialization | | `rustyline` | 15 | REPL readline interface | | `egglog-union-find` | 1.0 | Union-find for congruence closure | | `roaring` | 0.10 | Bitmap library for sparse relations | | `indexmap` | 2.0 | Order-preserving hash maps | | `uuid` | 1 | UUID generation | | `memmap2` | 0.9 | Memory-mapped file I/O | ### Testing Frameworks - `insta` — snapshot testing - `proptest` — property-based testing - `tempfile` — temporary directory management --- ## Architecture ``` ┌─────────────────────────────────────────────────────┐ │ USER INTERFACE │ │ REPL (interactive CLI) | Batch file loading │ ├─────────────────────────────────────────────────────┤ │ PARSING LAYER (Lexer → Parser → AST) │ │ chumsky-based lexer & parser, source error reporting│ ├─────────────────────────────────────────────────────┤ │ ELABORATION LAYER (AST → Core IR) │ │ Type checking, name resolution, theory/instance │ ├─────────────────────────────────────────────────────┤ │ CORE LAYER (Typed Representation) │ │ Signature, Term, Formula, Structure, ElaboratedTheory│ ├─────────────────────────────────────────────────────┤ │ STORAGE LAYER (Persistence) │ │ Append-only GeologMeta store with version control │ ├─────────────────────────────────────────────────────┤ │ QUERY & SOLVER LAYER (Execution) │ │ Chase algorithm, congruence closure, relational │ │ algebra compiler, SMT-style model enumeration │ ├─────────────────────────────────────────────────────┤ │ TENSOR ALGEBRA (Axiom Checking) │ │ Sparse tensor evaluation for axiom validation │ └─────────────────────────────────────────────────────┘ ``` --- ## Directory Structure | Path | Purpose | |------|---------| | `src/bin/geolog.rs` | CLI entry point | | `src/lib.rs` | Library root, exports `parse()` | | `src/repl.rs` | Interactive REPL state machine | | `src/lexer.rs` | Tokenization using chumsky | | `src/parser.rs` | Token stream → AST | | `src/ast.rs` | Abstract syntax tree types | | `src/core.rs` | Core IR: Signature, Term, Formula, Structure | | `src/elaborate/` | AST → Core elaboration | | `src/store/` | Persistence layer (append-only) | | `src/query/` | Chase algorithm, relational algebra | | `src/solver/` | SMT-style model enumeration | | `src/tensor/` | Sparse tensor algebra for axiom checking | | `src/cc.rs` | Congruence closure (union-find) | | `src/id.rs` | Luid/Slid identity system | | `src/universe.rs` | Global element registry | | `examples/geolog/` | 30+ example `.geolog` files | | `tests/` | 25+ test files | | `docs/` | ARCHITECTURE.md, SYNTAX.md | | `proofs/` | Lean4 formalization | | `fuzz/` | Fuzzing targets | --- ## Main Components ### Parsing & Syntax (~1,200 lines) - `lexer.rs` — tokenization - `parser.rs` — token stream → AST - `ast.rs` — AST types (Theory, Instance, Axiom, etc.) - `error.rs` — error formatting with source spans - `pretty.rs` — Core → Geolog source roundtrip printing ### Elaboration (~2,200 lines) - `elaborate/mod.rs` — coordination - `elaborate/theory.rs` — AST Theory → Core ElaboratedTheory - `elaborate/instance.rs` — AST Instance → Core Structure - `elaborate/env.rs` — environment with theory registry - `elaborate/types.rs` — type expression evaluation - `elaborate/error.rs` — type error reporting ### Core Representation - `core.rs` — DerivedSort, Signature, Structure, Formula, Term, Sequent - `id.rs` — Luid (global unique ID) and Slid (structure-local ID) - `universe.rs` — global element registry with UUID ↔ Luid mapping - `naming.rs` — bidirectional name ↔ Luid mapping ### Storage Layer (~1,500 lines) - `store/mod.rs` — main Store struct - `store/schema.rs` — cached sort/function/relation IDs - `store/append.rs` — low-level element append operations - `store/theory.rs` — theory CRUD - `store/instance.rs` — instance CRUD - `store/commit.rs` — git-like version control - `store/materialize.rs` — indexed views for fast lookups ### Query & Compilation (~3,500 lines) - `query/compile.rs` — Query → RelAlgIR plan compilation - `query/to_relalg.rs` — Query → Relational Algebra IR - `query/from_relalg.rs` — RelAlgIR → Executable QueryOp - `query/chase.rs` — chase algorithm for fixpoint computation - `query/backend.rs` — naive QueryOp executor - `query/optimize.rs` — algebraic law rewriting ### Solver & Model Enumeration (~1,300 lines) - `solver/mod.rs` — unified model enumeration API - `solver/tree.rs` — explicit search tree for partial models - `solver/tactics.rs` — automated search strategies: - CheckTactic: axiom validation - ForwardChainingTactic: Datalog-style inference - PropagateEquationsTactic: congruence closure - AutoTactic: composite fixpoint solver - `solver/types.rs` — SearchNode, Obligation, NodeStatus types ### Tensor Algebra (~2,600 lines) - `tensor/expr.rs` — lazy tensor expression trees - `tensor/sparse.rs` — sparse tensor storage (RoaringBitmap-based) - `tensor/builder.rs` — expression builders - `tensor/compile.rs` — Formula → TensorExpr compilation - `tensor/check.rs` — axiom checking via tensor evaluation --- ## Key Entry Points 1. **CLI**: `src/bin/geolog.rs` ``` Usage: geolog [-d ] [source_files...] ``` 2. **Parse Entry**: `src/lib.rs` exports `parse(input: &str) → Result` 3. **REPL State**: `src/repl.rs` — `ReplState::process_line()` 4. **Theory Elaboration**: `elaborate/theory.rs::elaborate_theory()` 5. **Instance Elaboration**: `elaborate/instance.rs::elaborate_instance_ctx()` 6. **Chase Algorithm**: `query/chase.rs::chase_fixpoint_with_cc()` 7. **Model Enumeration**: `solver/mod.rs::enumerate_models()` --- ## Design Decisions ### Geometric Logic Foundation - **Axioms as Sequents**: `forall vars. premises |- conclusion` - **Positive Conclusions**: Can have existentials, disjunctions, but never negations - **Geometric Morphisms**: Preserved by design, enabling category-theoretic semantics ### Identity System - **Luid** ("Local Universe ID"): Globally unique across all structures - **Slid** ("Structure-Local ID"): Index within a single structure - Bidirectional mapping enables persistent identity despite structure changes ### Append-Only Storage - **GeologMeta**: Single homoiconic theory instance storing all data - **Patch-based Versioning**: Each commit is a delta from parent - **Never Delete**: Elements only tombstoned for perfect audit trails ### Type System - **Postfix Application**: `x f` not `f(x)` — categorical style - **Derived Sorts**: Products of base sorts for record domains - **Product Domains**: Functions can take record arguments: `[x: M, y: M] -> M` - **Relations → Prop**: Relations are functions to `Prop` (boolean predicates) ### Chase Algorithm - **Fixpoint Iteration**: Derives all consequences until closure - **Congruence Closure Integration**: Merges elements when axioms conclude `x = y` - **Termination for Unit Laws**: Categories with unit laws no longer loop forever - Uses tensor algebra for efficient axiom checking ### Solver Architecture - **Explicit Search Tree**: Not implicit in call stack (AI-friendly for agent control) - **Refinement Preorder**: Structures can grow (carriers, functions, relations) - **Obligations vs Unsat**: Axiom obligation = need to witness conclusion (NOT failure) - **True Unsat**: Only when deriving `⊢ False` from instantiated axioms - **Tactics-based**: AutoTactic composes multiple tactics ### Relational Algebra Compilation - **QueryOp Intermediate**: SQL-like operators (Scan, Filter, Join, Project, etc.) - **Optimization Passes**: Filter fusion, projection pushdown - **Store-aware**: Compiled directly to GeologMeta queries with indexing ### Tensor Algebra for Axiom Checking - **Sparse Representation**: Roaring Bitmaps for efficient membership - **Lazy Expression Trees**: Tensor products fused with contractions - **Boolean Semiring**: AND for product, OR for sum --- ## REPL Commands ``` :list, :inspect - Introspection :add, :assert, :retract - Mutations :query, :explain, :compile - Query analysis :chase, :solve, :extend - Inference :commit, :history - Version control :source - Load programs :help - Show help ``` --- ## Parameterized Theories Theories can be parameterized by other instances: ```geolog theory (N : PetriNet instance) Marking { token : Sort; token/of : token -> N/P; } ``` This enables rich type-theoretic modeling (e.g., Petri net reachability as dependent types). --- ## Testing Infrastructure - **Property-based tests** (`proptest`): naming, overlay, patches, queries, structure, tensor, universe, solver - **Unit tests**: parsing, elaboration, meta, pretty-printing, relations, version control, workspace - **Integration tests**: 30+ `.geolog` example files - **Fuzzing**: `fuzz/` directory with parser and REPL fuzzing targets --- ## Project Status **Version**: 0.1.0 (Early production) ### Completed - Core geometric logic implementation - Parser, elaborator, and core IR - Chase algorithm with equality saturation - Solver with SMT-like model enumeration - Persistence and version control - Comprehensive test coverage ### Active Development - Nested instance elaboration - Homoiconic query plan representation - Disjunction variable alignment for tensor builder - Lean4 formalization of monotonic submodel proofs --- ## Key Files Reference | File | Line Count (approx) | Description | |------|---------------------|-------------| | `src/core.rs` | ~800 | Core type definitions | | `src/parser.rs` | ~600 | Parser implementation | | `src/repl.rs` | ~1000 | REPL state machine | | `src/query/chase.rs` | ~500 | Chase algorithm | | `src/solver/mod.rs` | ~400 | Model enumeration API | | `src/tensor/sparse.rs` | ~600 | Sparse tensor storage | | `src/store/mod.rs` | ~400 | Storage coordination |