2026-02-26 11:50:51 +01:00

322 lines
12 KiB
Markdown

# Geolog Project Notes
## Overview
**Geolog** is a **Geometric Logic REPL** — a type theory with semantics in topoi, designed for formal specifications using geometric logic.
### Core Capabilities
- **Geometric logic programming** — encode mathematical structures, relationships, and constraints
- **Database schema definition** — define sorts, functions, relations, and axioms
- **Model/instance creation** — create concrete finite models satisfying theory axioms
- **Automated inference** — chase algorithm for automatic fact derivation
- **Version control** — git-like commits and tracking for instances
- **Persistence** — append-only storage with optional disk persistence
### Use Cases
- Business process workflow orchestration
- Formal verification via diagrammatic rewriting
- Database query design
- Petri net reachability and process modeling
---
## Tech Stack
**Primary Language**: Rust (2021 edition, Cargo-based)
### Key Dependencies
| Crate | Version | Purpose |
|-------|---------|---------|
| `chumsky` | 0.9 | Parser combinator library |
| `ariadne` | 0.4 | Error reporting with source spans |
| `rkyv` | 0.7 | Zero-copy serialization |
| `rustyline` | 15 | REPL readline interface |
| `egglog-union-find` | 1.0 | Union-find for congruence closure |
| `roaring` | 0.10 | Bitmap library for sparse relations |
| `indexmap` | 2.0 | Order-preserving hash maps |
| `uuid` | 1 | UUID generation |
| `memmap2` | 0.9 | Memory-mapped file I/O |
### Testing Frameworks
- `insta` — snapshot testing
- `proptest` — property-based testing
- `tempfile` — temporary directory management
---
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ REPL (interactive CLI) | Batch file loading │
├─────────────────────────────────────────────────────┤
│ PARSING LAYER (Lexer → Parser → AST) │
│ chumsky-based lexer & parser, source error reporting│
├─────────────────────────────────────────────────────┤
│ ELABORATION LAYER (AST → Core IR) │
│ Type checking, name resolution, theory/instance │
├─────────────────────────────────────────────────────┤
│ CORE LAYER (Typed Representation) │
│ Signature, Term, Formula, Structure, ElaboratedTheory│
├─────────────────────────────────────────────────────┤
│ STORAGE LAYER (Persistence) │
│ Append-only GeologMeta store with version control │
├─────────────────────────────────────────────────────┤
│ QUERY & SOLVER LAYER (Execution) │
│ Chase algorithm, congruence closure, relational │
│ algebra compiler, SMT-style model enumeration │
├─────────────────────────────────────────────────────┤
│ TENSOR ALGEBRA (Axiom Checking) │
│ Sparse tensor evaluation for axiom validation │
└─────────────────────────────────────────────────────┘
```
---
## Directory Structure
| Path | Purpose |
|------|---------|
| `src/bin/geolog.rs` | CLI entry point |
| `src/lib.rs` | Library root, exports `parse()` |
| `src/repl.rs` | Interactive REPL state machine |
| `src/lexer.rs` | Tokenization using chumsky |
| `src/parser.rs` | Token stream → AST |
| `src/ast.rs` | Abstract syntax tree types |
| `src/core.rs` | Core IR: Signature, Term, Formula, Structure |
| `src/elaborate/` | AST → Core elaboration |
| `src/store/` | Persistence layer (append-only) |
| `src/query/` | Chase algorithm, relational algebra |
| `src/solver/` | SMT-style model enumeration |
| `src/tensor/` | Sparse tensor algebra for axiom checking |
| `src/cc.rs` | Congruence closure (union-find) |
| `src/id.rs` | Luid/Slid identity system |
| `src/universe.rs` | Global element registry |
| `examples/geolog/` | 30+ example `.geolog` files |
| `tests/` | 25+ test files |
| `docs/` | ARCHITECTURE.md, SYNTAX.md |
| `proofs/` | Lean4 formalization |
| `fuzz/` | Fuzzing targets |
---
## Main Components
### Parsing & Syntax (~1,200 lines)
- `lexer.rs` — tokenization
- `parser.rs` — token stream → AST
- `ast.rs` — AST types (Theory, Instance, Axiom, etc.)
- `error.rs` — error formatting with source spans
- `pretty.rs` — Core → Geolog source roundtrip printing
### Elaboration (~2,200 lines)
- `elaborate/mod.rs` — coordination
- `elaborate/theory.rs` — AST Theory → Core ElaboratedTheory
- `elaborate/instance.rs` — AST Instance → Core Structure
- `elaborate/env.rs` — environment with theory registry
- `elaborate/types.rs` — type expression evaluation
- `elaborate/error.rs` — type error reporting
### Core Representation
- `core.rs` — DerivedSort, Signature, Structure, Formula, Term, Sequent
- `id.rs` — Luid (global unique ID) and Slid (structure-local ID)
- `universe.rs` — global element registry with UUID ↔ Luid mapping
- `naming.rs` — bidirectional name ↔ Luid mapping
### Storage Layer (~1,500 lines)
- `store/mod.rs` — main Store struct
- `store/schema.rs` — cached sort/function/relation IDs
- `store/append.rs` — low-level element append operations
- `store/theory.rs` — theory CRUD
- `store/instance.rs` — instance CRUD
- `store/commit.rs` — git-like version control
- `store/materialize.rs` — indexed views for fast lookups
### Query & Compilation (~3,500 lines)
- `query/compile.rs` — Query → RelAlgIR plan compilation
- `query/to_relalg.rs` — Query → Relational Algebra IR
- `query/from_relalg.rs` — RelAlgIR → Executable QueryOp
- `query/chase.rs` — chase algorithm for fixpoint computation
- `query/backend.rs` — naive QueryOp executor
- `query/optimize.rs` — algebraic law rewriting
### Solver & Model Enumeration (~1,300 lines)
- `solver/mod.rs` — unified model enumeration API
- `solver/tree.rs` — explicit search tree for partial models
- `solver/tactics.rs` — automated search strategies:
- CheckTactic: axiom validation
- ForwardChainingTactic: Datalog-style inference
- PropagateEquationsTactic: congruence closure
- AutoTactic: composite fixpoint solver
- `solver/types.rs` — SearchNode, Obligation, NodeStatus types
### Tensor Algebra (~2,600 lines)
- `tensor/expr.rs` — lazy tensor expression trees
- `tensor/sparse.rs` — sparse tensor storage (RoaringBitmap-based)
- `tensor/builder.rs` — expression builders
- `tensor/compile.rs` — Formula → TensorExpr compilation
- `tensor/check.rs` — axiom checking via tensor evaluation
---
## Key Entry Points
1. **CLI**: `src/bin/geolog.rs`
```
Usage: geolog [-d <workspace>] [source_files...]
```
2. **Parse Entry**: `src/lib.rs` exports `parse(input: &str) → Result<File, String>`
3. **REPL State**: `src/repl.rs` — `ReplState::process_line()`
4. **Theory Elaboration**: `elaborate/theory.rs::elaborate_theory()`
5. **Instance Elaboration**: `elaborate/instance.rs::elaborate_instance_ctx()`
6. **Chase Algorithm**: `query/chase.rs::chase_fixpoint_with_cc()`
7. **Model Enumeration**: `solver/mod.rs::enumerate_models()`
---
## Design Decisions
### Geometric Logic Foundation
- **Axioms as Sequents**: `forall vars. premises |- conclusion`
- **Positive Conclusions**: Can have existentials, disjunctions, but never negations
- **Geometric Morphisms**: Preserved by design, enabling category-theoretic semantics
### Identity System
- **Luid** ("Local Universe ID"): Globally unique across all structures
- **Slid** ("Structure-Local ID"): Index within a single structure
- Bidirectional mapping enables persistent identity despite structure changes
### Append-Only Storage
- **GeologMeta**: Single homoiconic theory instance storing all data
- **Patch-based Versioning**: Each commit is a delta from parent
- **Never Delete**: Elements only tombstoned for perfect audit trails
### Type System
- **Postfix Application**: `x f` not `f(x)` — categorical style
- **Derived Sorts**: Products of base sorts for record domains
- **Product Domains**: Functions can take record arguments: `[x: M, y: M] -> M`
- **Relations → Prop**: Relations are functions to `Prop` (boolean predicates)
### Chase Algorithm
- **Fixpoint Iteration**: Derives all consequences until closure
- **Congruence Closure Integration**: Merges elements when axioms conclude `x = y`
- **Termination for Unit Laws**: Categories with unit laws no longer loop forever
- Uses tensor algebra for efficient axiom checking
### Solver Architecture
- **Explicit Search Tree**: Not implicit in call stack (AI-friendly for agent control)
- **Refinement Preorder**: Structures can grow (carriers, functions, relations)
- **Obligations vs Unsat**: Axiom obligation = need to witness conclusion (NOT failure)
- **True Unsat**: Only when deriving `⊢ False` from instantiated axioms
- **Tactics-based**: AutoTactic composes multiple tactics
### Relational Algebra Compilation
- **QueryOp Intermediate**: SQL-like operators (Scan, Filter, Join, Project, etc.)
- **Optimization Passes**: Filter fusion, projection pushdown
- **Store-aware**: Compiled directly to GeologMeta queries with indexing
### Tensor Algebra for Axiom Checking
- **Sparse Representation**: Roaring Bitmaps for efficient membership
- **Lazy Expression Trees**: Tensor products fused with contractions
- **Boolean Semiring**: AND for product, OR for sum
---
## REPL Commands
```
:list, :inspect <name> - Introspection
:add, :assert, :retract - Mutations
:query, :explain, :compile - Query analysis
:chase, :solve, :extend - Inference
:commit, :history - Version control
:source <file> - Load programs
:help - Show help
```
---
## Parameterized Theories
Theories can be parameterized by other instances:
```geolog
theory (N : PetriNet instance) Marking {
token : Sort;
token/of : token -> N/P;
}
```
This enables rich type-theoretic modeling (e.g., Petri net reachability as dependent types).
---
## Testing Infrastructure
- **Property-based tests** (`proptest`): naming, overlay, patches, queries, structure, tensor, universe, solver
- **Unit tests**: parsing, elaboration, meta, pretty-printing, relations, version control, workspace
- **Integration tests**: 30+ `.geolog` example files
- **Fuzzing**: `fuzz/` directory with parser and REPL fuzzing targets
---
## Project Status
**Version**: 0.1.0 (Early production)
### Completed
- Core geometric logic implementation
- Parser, elaborator, and core IR
- Chase algorithm with equality saturation
- Solver with SMT-like model enumeration
- Persistence and version control
- Comprehensive test coverage
### Active Development
- Nested instance elaboration
- Homoiconic query plan representation
- Disjunction variable alignment for tensor builder
- Lean4 formalization of monotonic submodel proofs
---
## Key Files Reference
| File | Line Count (approx) | Description |
|------|---------------------|-------------|
| `src/core.rs` | ~800 | Core type definitions |
| `src/parser.rs` | ~600 | Parser implementation |
| `src/repl.rs` | ~1000 | REPL state machine |
| `src/query/chase.rs` | ~500 | Chase algorithm |
| `src/solver/mod.rs` | ~400 | Model enumeration API |
| `src/tensor/sparse.rs` | ~600 | Sparse tensor storage |
| `src/store/mod.rs` | ~400 | Storage coordination |