geolog-zeta-fork/NOTES.md
2026-02-26 11:50:51 +01:00

12 KiB

Geolog Project Notes

Overview

Geolog is a Geometric Logic REPL — a type theory with semantics in topoi, designed for formal specifications using geometric logic.

Core Capabilities

  • Geometric logic programming — encode mathematical structures, relationships, and constraints
  • Database schema definition — define sorts, functions, relations, and axioms
  • Model/instance creation — create concrete finite models satisfying theory axioms
  • Automated inference — chase algorithm for automatic fact derivation
  • Version control — git-like commits and tracking for instances
  • Persistence — append-only storage with optional disk persistence

Use Cases

  • Business process workflow orchestration
  • Formal verification via diagrammatic rewriting
  • Database query design
  • Petri net reachability and process modeling

Tech Stack

Primary Language: Rust (2021 edition, Cargo-based)

Key Dependencies

Crate Version Purpose
chumsky 0.9 Parser combinator library
ariadne 0.4 Error reporting with source spans
rkyv 0.7 Zero-copy serialization
rustyline 15 REPL readline interface
egglog-union-find 1.0 Union-find for congruence closure
roaring 0.10 Bitmap library for sparse relations
indexmap 2.0 Order-preserving hash maps
uuid 1 UUID generation
memmap2 0.9 Memory-mapped file I/O

Testing Frameworks

  • insta — snapshot testing
  • proptest — property-based testing
  • tempfile — temporary directory management

Architecture

┌─────────────────────────────────────────────────────┐
│                    USER INTERFACE                    │
│  REPL (interactive CLI) | Batch file loading         │
├─────────────────────────────────────────────────────┤
│         PARSING LAYER (Lexer → Parser → AST)         │
│  chumsky-based lexer & parser, source error reporting│
├─────────────────────────────────────────────────────┤
│           ELABORATION LAYER (AST → Core IR)          │
│  Type checking, name resolution, theory/instance     │
├─────────────────────────────────────────────────────┤
│            CORE LAYER (Typed Representation)         │
│  Signature, Term, Formula, Structure, ElaboratedTheory│
├─────────────────────────────────────────────────────┤
│              STORAGE LAYER (Persistence)             │
│  Append-only GeologMeta store with version control   │
├─────────────────────────────────────────────────────┤
│            QUERY & SOLVER LAYER (Execution)          │
│  Chase algorithm, congruence closure, relational     │
│  algebra compiler, SMT-style model enumeration       │
├─────────────────────────────────────────────────────┤
│         TENSOR ALGEBRA (Axiom Checking)              │
│  Sparse tensor evaluation for axiom validation       │
└─────────────────────────────────────────────────────┘

Directory Structure

Path Purpose
src/bin/geolog.rs CLI entry point
src/lib.rs Library root, exports parse()
src/repl.rs Interactive REPL state machine
src/lexer.rs Tokenization using chumsky
src/parser.rs Token stream → AST
src/ast.rs Abstract syntax tree types
src/core.rs Core IR: Signature, Term, Formula, Structure
src/elaborate/ AST → Core elaboration
src/store/ Persistence layer (append-only)
src/query/ Chase algorithm, relational algebra
src/solver/ SMT-style model enumeration
src/tensor/ Sparse tensor algebra for axiom checking
src/cc.rs Congruence closure (union-find)
src/id.rs Luid/Slid identity system
src/universe.rs Global element registry
examples/geolog/ 30+ example .geolog files
tests/ 25+ test files
docs/ ARCHITECTURE.md, SYNTAX.md
proofs/ Lean4 formalization
fuzz/ Fuzzing targets

Main Components

Parsing & Syntax (~1,200 lines)

  • lexer.rs — tokenization
  • parser.rs — token stream → AST
  • ast.rs — AST types (Theory, Instance, Axiom, etc.)
  • error.rs — error formatting with source spans
  • pretty.rs — Core → Geolog source roundtrip printing

Elaboration (~2,200 lines)

  • elaborate/mod.rs — coordination
  • elaborate/theory.rs — AST Theory → Core ElaboratedTheory
  • elaborate/instance.rs — AST Instance → Core Structure
  • elaborate/env.rs — environment with theory registry
  • elaborate/types.rs — type expression evaluation
  • elaborate/error.rs — type error reporting

Core Representation

  • core.rs — DerivedSort, Signature, Structure, Formula, Term, Sequent
  • id.rs — Luid (global unique ID) and Slid (structure-local ID)
  • universe.rs — global element registry with UUID ↔ Luid mapping
  • naming.rs — bidirectional name ↔ Luid mapping

Storage Layer (~1,500 lines)

  • store/mod.rs — main Store struct
  • store/schema.rs — cached sort/function/relation IDs
  • store/append.rs — low-level element append operations
  • store/theory.rs — theory CRUD
  • store/instance.rs — instance CRUD
  • store/commit.rs — git-like version control
  • store/materialize.rs — indexed views for fast lookups

Query & Compilation (~3,500 lines)

  • query/compile.rs — Query → RelAlgIR plan compilation
  • query/to_relalg.rs — Query → Relational Algebra IR
  • query/from_relalg.rs — RelAlgIR → Executable QueryOp
  • query/chase.rs — chase algorithm for fixpoint computation
  • query/backend.rs — naive QueryOp executor
  • query/optimize.rs — algebraic law rewriting

Solver & Model Enumeration (~1,300 lines)

  • solver/mod.rs — unified model enumeration API
  • solver/tree.rs — explicit search tree for partial models
  • solver/tactics.rs — automated search strategies:
    • CheckTactic: axiom validation
    • ForwardChainingTactic: Datalog-style inference
    • PropagateEquationsTactic: congruence closure
    • AutoTactic: composite fixpoint solver
  • solver/types.rs — SearchNode, Obligation, NodeStatus types

Tensor Algebra (~2,600 lines)

  • tensor/expr.rs — lazy tensor expression trees
  • tensor/sparse.rs — sparse tensor storage (RoaringBitmap-based)
  • tensor/builder.rs — expression builders
  • tensor/compile.rs — Formula → TensorExpr compilation
  • tensor/check.rs — axiom checking via tensor evaluation

Key Entry Points

  1. CLI: src/bin/geolog.rs

    Usage: geolog [-d <workspace>] [source_files...]
    
  2. Parse Entry: src/lib.rs exports parse(input: &str) → Result<File, String>

  3. REPL State: src/repl.rsReplState::process_line()

  4. Theory Elaboration: elaborate/theory.rs::elaborate_theory()

  5. Instance Elaboration: elaborate/instance.rs::elaborate_instance_ctx()

  6. Chase Algorithm: query/chase.rs::chase_fixpoint_with_cc()

  7. Model Enumeration: solver/mod.rs::enumerate_models()


Design Decisions

Geometric Logic Foundation

  • Axioms as Sequents: forall vars. premises |- conclusion
  • Positive Conclusions: Can have existentials, disjunctions, but never negations
  • Geometric Morphisms: Preserved by design, enabling category-theoretic semantics

Identity System

  • Luid ("Local Universe ID"): Globally unique across all structures
  • Slid ("Structure-Local ID"): Index within a single structure
  • Bidirectional mapping enables persistent identity despite structure changes

Append-Only Storage

  • GeologMeta: Single homoiconic theory instance storing all data
  • Patch-based Versioning: Each commit is a delta from parent
  • Never Delete: Elements only tombstoned for perfect audit trails

Type System

  • Postfix Application: x f not f(x) — categorical style
  • Derived Sorts: Products of base sorts for record domains
  • Product Domains: Functions can take record arguments: [x: M, y: M] -> M
  • Relations → Prop: Relations are functions to Prop (boolean predicates)

Chase Algorithm

  • Fixpoint Iteration: Derives all consequences until closure
  • Congruence Closure Integration: Merges elements when axioms conclude x = y
  • Termination for Unit Laws: Categories with unit laws no longer loop forever
  • Uses tensor algebra for efficient axiom checking

Solver Architecture

  • Explicit Search Tree: Not implicit in call stack (AI-friendly for agent control)
  • Refinement Preorder: Structures can grow (carriers, functions, relations)
  • Obligations vs Unsat: Axiom obligation = need to witness conclusion (NOT failure)
  • True Unsat: Only when deriving ⊢ False from instantiated axioms
  • Tactics-based: AutoTactic composes multiple tactics

Relational Algebra Compilation

  • QueryOp Intermediate: SQL-like operators (Scan, Filter, Join, Project, etc.)
  • Optimization Passes: Filter fusion, projection pushdown
  • Store-aware: Compiled directly to GeologMeta queries with indexing

Tensor Algebra for Axiom Checking

  • Sparse Representation: Roaring Bitmaps for efficient membership
  • Lazy Expression Trees: Tensor products fused with contractions
  • Boolean Semiring: AND for product, OR for sum

REPL Commands

:list, :inspect <name>         - Introspection
:add, :assert, :retract        - Mutations
:query, :explain, :compile     - Query analysis
:chase, :solve, :extend        - Inference
:commit, :history              - Version control
:source <file>                 - Load programs
:help                          - Show help

Parameterized Theories

Theories can be parameterized by other instances:

theory (N : PetriNet instance) Marking {
  token : Sort;
  token/of : token -> N/P;
}

This enables rich type-theoretic modeling (e.g., Petri net reachability as dependent types).


Testing Infrastructure

  • Property-based tests (proptest): naming, overlay, patches, queries, structure, tensor, universe, solver
  • Unit tests: parsing, elaboration, meta, pretty-printing, relations, version control, workspace
  • Integration tests: 30+ .geolog example files
  • Fuzzing: fuzz/ directory with parser and REPL fuzzing targets

Project Status

Version: 0.1.0 (Early production)

Completed

  • Core geometric logic implementation
  • Parser, elaborator, and core IR
  • Chase algorithm with equality saturation
  • Solver with SMT-like model enumeration
  • Persistence and version control
  • Comprehensive test coverage

Active Development

  • Nested instance elaboration
  • Homoiconic query plan representation
  • Disjunction variable alignment for tensor builder
  • Lean4 formalization of monotonic submodel proofs

Key Files Reference

File Line Count (approx) Description
src/core.rs ~800 Core type definitions
src/parser.rs ~600 Parser implementation
src/repl.rs ~1000 REPL state machine
src/query/chase.rs ~500 Chase algorithm
src/solver/mod.rs ~400 Model enumeration API
src/tensor/sparse.rs ~600 Sparse tensor storage
src/store/mod.rs ~400 Storage coordination