322 lines
12 KiB
Markdown
322 lines
12 KiB
Markdown
# Geolog Project Notes
|
|
|
|
## Overview
|
|
|
|
**Geolog** is a **Geometric Logic REPL** — a type theory with semantics in topoi, designed for formal specifications using geometric logic.
|
|
|
|
### Core Capabilities
|
|
|
|
- **Geometric logic programming** — encode mathematical structures, relationships, and constraints
|
|
- **Database schema definition** — define sorts, functions, relations, and axioms
|
|
- **Model/instance creation** — create concrete finite models satisfying theory axioms
|
|
- **Automated inference** — chase algorithm for automatic fact derivation
|
|
- **Version control** — git-like commits and tracking for instances
|
|
- **Persistence** — append-only storage with optional disk persistence
|
|
|
|
### Use Cases
|
|
|
|
- Business process workflow orchestration
|
|
- Formal verification via diagrammatic rewriting
|
|
- Database query design
|
|
- Petri net reachability and process modeling
|
|
|
|
---
|
|
|
|
## Tech Stack
|
|
|
|
**Primary Language**: Rust (2021 edition, Cargo-based)
|
|
|
|
### Key Dependencies
|
|
|
|
| Crate | Version | Purpose |
|
|
|-------|---------|---------|
|
|
| `chumsky` | 0.9 | Parser combinator library |
|
|
| `ariadne` | 0.4 | Error reporting with source spans |
|
|
| `rkyv` | 0.7 | Zero-copy serialization |
|
|
| `rustyline` | 15 | REPL readline interface |
|
|
| `egglog-union-find` | 1.0 | Union-find for congruence closure |
|
|
| `roaring` | 0.10 | Bitmap library for sparse relations |
|
|
| `indexmap` | 2.0 | Order-preserving hash maps |
|
|
| `uuid` | 1 | UUID generation |
|
|
| `memmap2` | 0.9 | Memory-mapped file I/O |
|
|
|
|
### Testing Frameworks
|
|
|
|
- `insta` — snapshot testing
|
|
- `proptest` — property-based testing
|
|
- `tempfile` — temporary directory management
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ USER INTERFACE │
|
|
│ REPL (interactive CLI) | Batch file loading │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ PARSING LAYER (Lexer → Parser → AST) │
|
|
│ chumsky-based lexer & parser, source error reporting│
|
|
├─────────────────────────────────────────────────────┤
|
|
│ ELABORATION LAYER (AST → Core IR) │
|
|
│ Type checking, name resolution, theory/instance │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ CORE LAYER (Typed Representation) │
|
|
│ Signature, Term, Formula, Structure, ElaboratedTheory│
|
|
├─────────────────────────────────────────────────────┤
|
|
│ STORAGE LAYER (Persistence) │
|
|
│ Append-only GeologMeta store with version control │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ QUERY & SOLVER LAYER (Execution) │
|
|
│ Chase algorithm, congruence closure, relational │
|
|
│ algebra compiler, SMT-style model enumeration │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ TENSOR ALGEBRA (Axiom Checking) │
|
|
│ Sparse tensor evaluation for axiom validation │
|
|
└─────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
| Path | Purpose |
|
|
|------|---------|
|
|
| `src/bin/geolog.rs` | CLI entry point |
|
|
| `src/lib.rs` | Library root, exports `parse()` |
|
|
| `src/repl.rs` | Interactive REPL state machine |
|
|
| `src/lexer.rs` | Tokenization using chumsky |
|
|
| `src/parser.rs` | Token stream → AST |
|
|
| `src/ast.rs` | Abstract syntax tree types |
|
|
| `src/core.rs` | Core IR: Signature, Term, Formula, Structure |
|
|
| `src/elaborate/` | AST → Core elaboration |
|
|
| `src/store/` | Persistence layer (append-only) |
|
|
| `src/query/` | Chase algorithm, relational algebra |
|
|
| `src/solver/` | SMT-style model enumeration |
|
|
| `src/tensor/` | Sparse tensor algebra for axiom checking |
|
|
| `src/cc.rs` | Congruence closure (union-find) |
|
|
| `src/id.rs` | Luid/Slid identity system |
|
|
| `src/universe.rs` | Global element registry |
|
|
| `examples/geolog/` | 30+ example `.geolog` files |
|
|
| `tests/` | 25+ test files |
|
|
| `docs/` | ARCHITECTURE.md, SYNTAX.md |
|
|
| `proofs/` | Lean4 formalization |
|
|
| `fuzz/` | Fuzzing targets |
|
|
|
|
---
|
|
|
|
## Main Components
|
|
|
|
### Parsing & Syntax (~1,200 lines)
|
|
|
|
- `lexer.rs` — tokenization
|
|
- `parser.rs` — token stream → AST
|
|
- `ast.rs` — AST types (Theory, Instance, Axiom, etc.)
|
|
- `error.rs` — error formatting with source spans
|
|
- `pretty.rs` — Core → Geolog source roundtrip printing
|
|
|
|
### Elaboration (~2,200 lines)
|
|
|
|
- `elaborate/mod.rs` — coordination
|
|
- `elaborate/theory.rs` — AST Theory → Core ElaboratedTheory
|
|
- `elaborate/instance.rs` — AST Instance → Core Structure
|
|
- `elaborate/env.rs` — environment with theory registry
|
|
- `elaborate/types.rs` — type expression evaluation
|
|
- `elaborate/error.rs` — type error reporting
|
|
|
|
### Core Representation
|
|
|
|
- `core.rs` — DerivedSort, Signature, Structure, Formula, Term, Sequent
|
|
- `id.rs` — Luid (global unique ID) and Slid (structure-local ID)
|
|
- `universe.rs` — global element registry with UUID ↔ Luid mapping
|
|
- `naming.rs` — bidirectional name ↔ Luid mapping
|
|
|
|
### Storage Layer (~1,500 lines)
|
|
|
|
- `store/mod.rs` — main Store struct
|
|
- `store/schema.rs` — cached sort/function/relation IDs
|
|
- `store/append.rs` — low-level element append operations
|
|
- `store/theory.rs` — theory CRUD
|
|
- `store/instance.rs` — instance CRUD
|
|
- `store/commit.rs` — git-like version control
|
|
- `store/materialize.rs` — indexed views for fast lookups
|
|
|
|
### Query & Compilation (~3,500 lines)
|
|
|
|
- `query/compile.rs` — Query → RelAlgIR plan compilation
|
|
- `query/to_relalg.rs` — Query → Relational Algebra IR
|
|
- `query/from_relalg.rs` — RelAlgIR → Executable QueryOp
|
|
- `query/chase.rs` — chase algorithm for fixpoint computation
|
|
- `query/backend.rs` — naive QueryOp executor
|
|
- `query/optimize.rs` — algebraic law rewriting
|
|
|
|
### Solver & Model Enumeration (~1,300 lines)
|
|
|
|
- `solver/mod.rs` — unified model enumeration API
|
|
- `solver/tree.rs` — explicit search tree for partial models
|
|
- `solver/tactics.rs` — automated search strategies:
|
|
- CheckTactic: axiom validation
|
|
- ForwardChainingTactic: Datalog-style inference
|
|
- PropagateEquationsTactic: congruence closure
|
|
- AutoTactic: composite fixpoint solver
|
|
- `solver/types.rs` — SearchNode, Obligation, NodeStatus types
|
|
|
|
### Tensor Algebra (~2,600 lines)
|
|
|
|
- `tensor/expr.rs` — lazy tensor expression trees
|
|
- `tensor/sparse.rs` — sparse tensor storage (RoaringBitmap-based)
|
|
- `tensor/builder.rs` — expression builders
|
|
- `tensor/compile.rs` — Formula → TensorExpr compilation
|
|
- `tensor/check.rs` — axiom checking via tensor evaluation
|
|
|
|
---
|
|
|
|
## Key Entry Points
|
|
|
|
1. **CLI**: `src/bin/geolog.rs`
|
|
```
|
|
Usage: geolog [-d <workspace>] [source_files...]
|
|
```
|
|
|
|
2. **Parse Entry**: `src/lib.rs` exports `parse(input: &str) → Result<File, String>`
|
|
|
|
3. **REPL State**: `src/repl.rs` — `ReplState::process_line()`
|
|
|
|
4. **Theory Elaboration**: `elaborate/theory.rs::elaborate_theory()`
|
|
|
|
5. **Instance Elaboration**: `elaborate/instance.rs::elaborate_instance_ctx()`
|
|
|
|
6. **Chase Algorithm**: `query/chase.rs::chase_fixpoint_with_cc()`
|
|
|
|
7. **Model Enumeration**: `solver/mod.rs::enumerate_models()`
|
|
|
|
---
|
|
|
|
## Design Decisions
|
|
|
|
### Geometric Logic Foundation
|
|
|
|
- **Axioms as Sequents**: `forall vars. premises |- conclusion`
|
|
- **Positive Conclusions**: Can have existentials, disjunctions, but never negations
|
|
- **Geometric Morphisms**: Preserved by design, enabling category-theoretic semantics
|
|
|
|
### Identity System
|
|
|
|
- **Luid** ("Local Universe ID"): Globally unique across all structures
|
|
- **Slid** ("Structure-Local ID"): Index within a single structure
|
|
- Bidirectional mapping enables persistent identity despite structure changes
|
|
|
|
### Append-Only Storage
|
|
|
|
- **GeologMeta**: Single homoiconic theory instance storing all data
|
|
- **Patch-based Versioning**: Each commit is a delta from parent
|
|
- **Never Delete**: Elements only tombstoned for perfect audit trails
|
|
|
|
### Type System
|
|
|
|
- **Postfix Application**: `x f` not `f(x)` — categorical style
|
|
- **Derived Sorts**: Products of base sorts for record domains
|
|
- **Product Domains**: Functions can take record arguments: `[x: M, y: M] -> M`
|
|
- **Relations → Prop**: Relations are functions to `Prop` (boolean predicates)
|
|
|
|
### Chase Algorithm
|
|
|
|
- **Fixpoint Iteration**: Derives all consequences until closure
|
|
- **Congruence Closure Integration**: Merges elements when axioms conclude `x = y`
|
|
- **Termination for Unit Laws**: Categories with unit laws no longer loop forever
|
|
- Uses tensor algebra for efficient axiom checking
|
|
|
|
### Solver Architecture
|
|
|
|
- **Explicit Search Tree**: Not implicit in call stack (AI-friendly for agent control)
|
|
- **Refinement Preorder**: Structures can grow (carriers, functions, relations)
|
|
- **Obligations vs Unsat**: Axiom obligation = need to witness conclusion (NOT failure)
|
|
- **True Unsat**: Only when deriving `⊢ False` from instantiated axioms
|
|
- **Tactics-based**: AutoTactic composes multiple tactics
|
|
|
|
### Relational Algebra Compilation
|
|
|
|
- **QueryOp Intermediate**: SQL-like operators (Scan, Filter, Join, Project, etc.)
|
|
- **Optimization Passes**: Filter fusion, projection pushdown
|
|
- **Store-aware**: Compiled directly to GeologMeta queries with indexing
|
|
|
|
### Tensor Algebra for Axiom Checking
|
|
|
|
- **Sparse Representation**: Roaring Bitmaps for efficient membership
|
|
- **Lazy Expression Trees**: Tensor products fused with contractions
|
|
- **Boolean Semiring**: AND for product, OR for sum
|
|
|
|
---
|
|
|
|
## REPL Commands
|
|
|
|
```
|
|
:list, :inspect <name> - Introspection
|
|
:add, :assert, :retract - Mutations
|
|
:query, :explain, :compile - Query analysis
|
|
:chase, :solve, :extend - Inference
|
|
:commit, :history - Version control
|
|
:source <file> - Load programs
|
|
:help - Show help
|
|
```
|
|
|
|
---
|
|
|
|
## Parameterized Theories
|
|
|
|
Theories can be parameterized by other instances:
|
|
|
|
```geolog
|
|
theory (N : PetriNet instance) Marking {
|
|
token : Sort;
|
|
token/of : token -> N/P;
|
|
}
|
|
```
|
|
|
|
This enables rich type-theoretic modeling (e.g., Petri net reachability as dependent types).
|
|
|
|
---
|
|
|
|
## Testing Infrastructure
|
|
|
|
- **Property-based tests** (`proptest`): naming, overlay, patches, queries, structure, tensor, universe, solver
|
|
- **Unit tests**: parsing, elaboration, meta, pretty-printing, relations, version control, workspace
|
|
- **Integration tests**: 30+ `.geolog` example files
|
|
- **Fuzzing**: `fuzz/` directory with parser and REPL fuzzing targets
|
|
|
|
---
|
|
|
|
## Project Status
|
|
|
|
**Version**: 0.1.0 (Early production)
|
|
|
|
### Completed
|
|
|
|
- Core geometric logic implementation
|
|
- Parser, elaborator, and core IR
|
|
- Chase algorithm with equality saturation
|
|
- Solver with SMT-like model enumeration
|
|
- Persistence and version control
|
|
- Comprehensive test coverage
|
|
|
|
### Active Development
|
|
|
|
- Nested instance elaboration
|
|
- Homoiconic query plan representation
|
|
- Disjunction variable alignment for tensor builder
|
|
- Lean4 formalization of monotonic submodel proofs
|
|
|
|
---
|
|
|
|
## Key Files Reference
|
|
|
|
| File | Line Count (approx) | Description |
|
|
|------|---------------------|-------------|
|
|
| `src/core.rs` | ~800 | Core type definitions |
|
|
| `src/parser.rs` | ~600 | Parser implementation |
|
|
| `src/repl.rs` | ~1000 | REPL state machine |
|
|
| `src/query/chase.rs` | ~500 | Chase algorithm |
|
|
| `src/solver/mod.rs` | ~400 | Model enumeration API |
|
|
| `src/tensor/sparse.rs` | ~600 | Sparse tensor storage |
|
|
| `src/store/mod.rs` | ~400 | Storage coordination |
|