Compare commits

...

10 Commits

Author SHA1 Message Date
Hassan Abedi
809e102f95 Add a note file for Datomic 2026-03-26 16:07:16 +01:00
Hassan Abedi
bd64f88496 Add a note file for Datomic 2026-03-26 16:05:41 +01:00
5a33e673c9 Update hassan/008-chase-algorithm.md 2026-03-06 12:25:31 +00:00
9f4e471223 Add document about relevant logical concepts 2026-03-06 12:24:51 +00:00
34895c057b Add information about chase algorithm variants 2026-03-06 10:29:24 +00:00
Hassan Abedi
012babb6fa WIP 2026-03-05 13:15:11 +01:00
Hassan Abedi
0c49a46eaa Make chase algotithm notes general 2026-03-05 13:14:41 +01:00
Hassan Abedi
70684aae49 Add chase query engine list 2026-03-05 13:05:31 +01:00
Hassan Abedi
4651b09e28 Add comparison of toy-datalog vs felix-db 2026-03-04 13:25:42 +01:00
Hassan Abedi
929c71488a WIP 2026-03-04 12:37:56 +01:00
6 changed files with 900 additions and 2 deletions

View File

@ -43,7 +43,6 @@ Compared to the above projects, Geolog combines several features that no single
* **Interactive REPL** — explore and mutate instances dynamically, unlike batch-mode model finders
* **Constructive proof witnesses** — the Petri net showcase demonstrates solutions as typed diagrammatic objects, not just boolean answers
## Changelog
* **Feb 27, 2026** -- The first version was created.

View File

@ -39,7 +39,6 @@
proof. If the types check and all axioms are satisfied, reachability is proven. This is the "propositions as types, proofs as programs" idea from
type theory, applied to diagrammatic reasoning.
## Changelog
* **Feb 27, 2026** -- The first version was created.

View File

@ -0,0 +1,452 @@
## Comparison: toy-datalog vs felix-db
This document compares two Haskell implementations of Datalog interpreters.
---
### Overview
| Aspect | toy-datalog | felix-db |
|-------------------|-----------------------------|---------------------------|
| **Author** | Cale Gibbard | (felix-db team) |
| **Version** | 0.1.0.0 | 0.1.0.0 |
| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) |
| **License** | BSD-2-Clause | (not specified) |
| **Lines of Code** | ~300 | ~700 |
| **Primary Focus** | Minimal core implementation | More complete feature set |
---
### Architecture Comparison
#### toy-datalog
```
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
```
Simple linear pipeline with 4 modules:
- `Syntax.hs` - AST definitions
- `Parser.hs` - Megaparsec parser
- `Eval.hs` - Evaluation engine
- `IO.hs` - File utilities
#### felix-db
```
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
DigestedQuery
```
More modular architecture with 8+ modules:
- `DatalogParser.hs` - Parser
- `DatalogDB.hs` - Database type class
- `InMemoryDB.hs` - Concrete implementation
- `Rules.hs` - Rule processing/digestion
- `NaiveQE.hs` - Query engine
- `DigestedQuery.hs` - Query preprocessing
- `QueryEngine.hs` - Query engine type class
- `Utility.hs` - Helper functions
**Winner: felix-db** - Better separation of concerns and extensibility via type classes.
---
### Syntax & AST
#### Term Representation
| Feature | toy-datalog | felix-db |
|-----------|---------------------------------|-------------------------------|
| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) |
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
| Numbers | Parsed as constants | `Num Integer` (separate type) |
| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) |
```haskell
-- toy-datalog
data Term = Con ConId | Var VarId
-- felix-db
data Term = Var Text | Sym Text | Num Integer
```
**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
**felix-db advantage:** Explicit `Num` type for numeric constants.
#### Rule Representation
```haskell
-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]
-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
```
**felix-db advantage:** Supports negation syntax (`positive :: Bool`).
---
### Parser Comparison
| Feature | toy-datalog | felix-db |
|----------------|---------------|-----------------------------|
| Library | Megaparsec | Megaparsec |
| Line comments | `--` | `--` |
| Block comments | `{- -}` | `/* */` |
| Rule arrow | `:-` only | `:-`, `→`, `->` |
| Negation | Not supported | `not` or `!` prefix |
| Queries | Not supported | `?-` syntax with projection |
| Quoted strings | No | Yes (`"alice"`) |
| Numbers | As constants | As separate `Num` type |
#### Example Syntax
```datalog
-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).
-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.
```
**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
---
### Data Structures
#### Relation Storage
```haskell
-- toy-datalog: Indexed storage
data Relation = Relation
{ _relation_arity :: Int
, _relation_members :: Set [ConId]
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
}
-- felix-db: Simple storage + rules
data Relation = Relation
{ name :: RelationId
, arity :: Int
, tuples :: Set [Constant] -- No index
, rules :: [RelationRule] -- Rules stored with relation
}
```
**toy-datalog advantage:** Position-based indexing for efficient lookups.
**felix-db advantage:** Rules stored directly with relations (cleaner organization).
#### Database Structure
```haskell
-- toy-datalog
data Database = Database
{ _database_universe :: Set ConId
, _database_relations :: Map RelId Relation
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
}
-- felix-db
data InMemoryDB = InMemoryDB
{ relations :: Map RelationId Relation -- Rules inside Relation
, constants :: Set Constant
}
```
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
---
### Evaluation Strategy
Both use **naive bottom-up evaluation** with fixed-point iteration.
#### Algorithm Comparison
| Aspect | toy-datalog | felix-db |
|------------------|--------------------------|----------------------------|
| Strategy | Naive bottom-up | Naive bottom-up |
| Termination | Fixed-point check | Fixed-point check |
| Variable binding | `Map VarId ConId` | List indexed by position |
| Join operation | `><` operator (set join) | Cartesian product + filter |
| Index usage | Yes (position-based) | No |
#### toy-datalog Evaluation
```haskell
-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database
```
Uses indices to find candidate tuples, then intersects results for bound positions.
#### felix-db Evaluation
```haskell
-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
```
Generates all possible variable assignments from Herbrand universe, then filters.
#### Variable Binding Approach
```haskell
-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value
-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices
```
**toy-datalog:** More intuitive, variable names preserved throughout.
**felix-db:** More efficient (integer indexing), but loses variable names.
---
### Feature Comparison
| Feature | toy-datalog | felix-db |
|------------------------------|-------------|------------------|
| Facts | ✅ | ✅ |
| Rules | ✅ | ✅ |
| Recursive rules | ✅ | ✅ |
| Multiple rules per predicate | ✅ | ✅ |
| Arity checking | ✅ | ✅ |
| Fixed-point iteration | ✅ | ✅ |
| Position-based indexing | ✅ | ❌ |
| Interactive queries | ❌ | ✅ (`?-` syntax) |
| Query projection | ❌ | ✅ (`→ X, Y`) |
| Negation (parsing) | ❌ | ✅ |
| Negation (evaluation) | ❌ | ❌ |
| Numbers as distinct type | ❌ | ✅ |
| Quoted strings | ❌ | ✅ |
| Pretty printing | ✅ | ❌ |
| Database type class | ❌ | ✅ |
| Query engine type class | ❌ | ✅ |
| PostgreSQL integration | ❌ | 🔄 (in progress) |
---
### Error Handling
#### toy-datalog
```haskell
data EvalError =
EvalError_RuleWrongArity Rule WrongArity
| EvalError_AtomWrongArity Atom WrongArity
data WrongArity = WrongArity
{ _wrongArity_relation :: RelId
, _wrongArity_expected :: Int
, _wrongArity_got :: Int
}
```
Detailed error types with context.
#### felix-db
```haskell
data DatalogException
= DuplicateRelationException Text
| ArityMismatchException Text Int Int
| RelationNotFoundException Text
| VariableNotFoundException Text
```
More error types but less context (just error message components).
**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
---
### Testing
| Aspect | toy-datalog | felix-db |
|------------------|-------------------|-----------------|
| Framework | Tasty + HUnit | Hspec |
| Parser tests | ✅ Golden tests | ✅ Unit tests |
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
| Property tests | ❌ | ❌ |
| Test files | 2 | 7 |
#### felix-db Test Examples
```haskell
-- Reflexive relation
"equiv(X, X) :- ."
-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."
-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
```
**Winner: felix-db** - Much more comprehensive test coverage.
---
### Performance Characteristics
#### Indexing
| Operation | toy-datalog | felix-db |
|--------------------------------|---------------------------------|----------------|
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
| Insert tuple | O(k log n) (update k indices) | O(log m) |
**toy-datalog advantage:** Indexed lookups for atoms with bound positions.
#### Evaluation Complexity
Both have the same theoretical complexity for naive evaluation:
- **Time per iteration:** O(n^k × |rules| × r)
- n = Herbrand universe size
- k = max variables per rule
- r = max body literals
However, toy-datalog's indexing provides practical speedup when atoms have constants.
#### Memory Usage
| Aspect | toy-datalog | felix-db |
|----------------|----------------|----------------|
| Tuple storage | Set [ConId] | Set [Constant] |
| Index overhead | O(k × m) extra | None |
| Rule storage | Separate map | In relations |
toy-datalog uses more memory due to indices, but gains query performance.
---
### Code Quality
#### toy-datalog
**Strengths:**
- Minimal, focused implementation
- Good use of newtypes for type safety
- Pretty printing support
- Clean `HasConstants` typeclass
**Weaknesses:**
- No executable (commented out)
- Evaluation tests disabled
- No query interface
#### felix-db
**Strengths:**
- Comprehensive test suite
- Type class abstraction (`DatalogDB`, `QueryEngine`)
- Query syntax with projection
- More complete feature set
- PostgreSQL integration in progress
**Weaknesses:**
- No position-based indexing
- Raw `Text` instead of newtypes
- No pretty printing
- More complex codebase
---
### Use Case Recommendations
#### Use toy-datalog when:
- Learning Datalog evaluation fundamentals
- Need indexed lookups for performance
- Want minimal, readable implementation
- Building something from scratch
#### Use felix-db when:
- Need query syntax (`?-`)
- Want comprehensive tests as reference
- Planning to extend with database backends
- Need negation syntax (even if not evaluated)
---
### Summary Table
| Category | Winner | Notes |
|------------------|-------------|-----------------------------------|
| Type Safety | toy-datalog | Newtypes prevent errors |
| Features | felix-db | Queries, negation syntax, numbers |
| Indexing | toy-datalog | Position-based index |
| Testing | felix-db | 7 test files vs 1 active |
| Extensibility | felix-db | Type classes for DB/QE |
| Code Clarity | tie | Both well-organized |
| Error Messages | toy-datalog | Full context in errors |
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
| Production Ready | neither | Both need work |
---
### Potential Cross-Pollination
#### toy-datalog could adopt from felix-db:
1. Query syntax (`?-` with projection)
2. Type class abstraction for database backend
3. Comprehensive test suite
4. Negation parsing (for future implementation)
5. Separate `Num` type for integers
#### felix-db could adopt from toy-datalog:
1. Position-based indexing for performance
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
3. Pretty printing (`Pretty` typeclass)
4. Richer error types with context
5. `HasConstants` typeclass pattern
---
### Conclusion
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
They represent different design tradeoffs:
- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
- **felix-db** prioritizes **features and extensibility** with a more complete implementation
Neither supports negation evaluation, aggregation, or semi-naive evaluation.
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
## Changelog
* **Mar 4, 2026** -- The first version was created.

View File

@ -0,0 +1,44 @@
## Chase Algorithm Resources
### Papers
| Paper | Authors | Link |
|:--------------------------------------------|:-------------------------|:-------------------------------------------------------------------------------------|
| The Theory of Joins in Relational Databases | Aho, Beeri & Ullman | [dl.acm.org/doi/10.1145/320083.320091](https://dl.acm.org/doi/10.1145/320083.320091) |
| Testing Implications of Data Dependencies | Maier, Mendelzon & Sagiv | [dl.acm.org/doi/10.1145/320107.320115](https://dl.acm.org/doi/10.1145/320107.320115) |
### Algorithm Variants
| Variant | Output Size | Speed per Step | Satisfaction Check | Used By | Best For |
|:---------------|:------------|:---------------|:-------------------|:------------------|:----------------------------------------------|
| Oblivious | Largest | Fastest | None | — | Prototyping; when output size doesn't matter |
| Skolem | Medium | Fast | Implicit (Skolem terms) | VLog, Graal | Parallelism; data exchange; deterministic runs |
| Restricted | Smallest | Slower | Explicit (homomorphism check) | RDFox, VLog | Production use; query performance over large KGs |
| Core | Most compact | Slowest | Full core computation | — (theoretical) | Theoretical analysis; minimum representations |
#### Variant Notes
**Oblivious Chase** fires every applicable rule without checking whether the conclusion already holds. Simple to implement but produces redundant facts and is the hardest to guarantee termination for.
**Skolem Chase** uses deterministic Skolem functions to generate nulls — the same rule trigger always produces the same null. This makes it idempotent and easy to parallelize, with no need to track which rules have already fired.
**Restricted Chase** checks before firing whether the rule head is already satisfied via a homomorphism into the current database. Never adds redundant facts, producing the smallest universal solution. The homomorphism check is NP-hard in general but fast in practice with good indexing.
**Core Chase** goes further than the restricted chase, computing the logical core — the most compressed possible universal solution. Theoretically important but computationally expensive; not used in production engines.
### Chase Query Engines
| Engine | Organization | Chase Variant | Home Page |
|:---------|:-------------------------------|:-------------------|:-------------------------------------------------------------------------------------|
| RDFox | Oxford Semantic Technologies | Restricted | [oxfordsemantic.tech/rdfox](https://oxfordsemantic.tech/rdfox) |
| VLog | TU Dresden / VU Amsterdam | Restricted + Skolem | [github.com/karmaresearch/vlog](https://github.com/karmaresearch/vlog) |
| Graal | LIRMM, Montpellier | Skolem + Oblivious | [graphik-team.github.io/graal](https://graphik-team.github.io/graal) |
| Vadalog | University of Oxford / TU Wien | Warded (Restricted variant) | [vadalog-system.com](https://vadalog-system.com) |
| PDQ | University of Oxford | Skolem | [github.com/michaelbenedict/pdq](https://github.com/michaelbenedict/pdq) |
| Llunatic | University of Genova | Skolem | [github.com/donatellosantoro/Llunatic](https://github.com/donatellosantoro/Llunatic) |
## Changelog
* **Mar 6, 2026** -- Added algorithm variants section; added chase variant column to engines table.
* **Mar 5, 2026** -- Re-formatted the table for the papers section.
* **Mar 5, 2026** -- The first version was created.

285
hassan/009-logic-primer.md Normal file
View File

@ -0,0 +1,285 @@
# Logic Primer
A reference for the core concepts of first-order logic and database theory.
---
## Building Blocks
### Constants
Specific known values — real things in your database.
```
Alice, Bob, Engineering, 42
```
### Variables
Placeholders for unknown values, usually written in uppercase or with `?`.
```
X, Y, ?person, ?dept
```
### Terms
A term is either a constant or a variable. Anything that can fill an argument slot in a predicate.
---
## Predicates and Atoms
A **predicate** is a named relation applied to a list of terms:
```
Employee(X, Y) -- predicate "Employee", arity 2
Likes(X, Y, Z) -- predicate "Likes", arity 3
Person(Alice) -- predicate "Person", arity 1
```
The number of arguments a predicate takes is its **arity**.
An **atom** is a predicate applied to specific terms. When all terms are constants it is called a **ground atom** — a concrete fact in the database:
```
Employee(Alice, Engineering) -- ground atom
```
---
## Rules
A **rule** is an if-then statement built from atoms, split into two parts:
```
body → head
```
| Part | Also Called | Meaning |
|:-----|:------------|:--------|
| Body | Antecedent, LHS | Conditions that must hold |
| Head | Consequent, RHS | What must be true if body holds |
Example:
```
Employee(X, Y), Department(Y) → ∃Z. ManagedBy(Y, Z)
```
Variables in the head that do not appear in the body are **existential variables** — they represent unknown values that must exist, but whose identity is not known.
---
## Quantifiers
| Symbol | Name | Meaning | Example |
|:-------|:-----|:--------|:--------|
| `∀` | Universal | "For all..." | `∀X. Person(X) → Mortal(X)` |
| `∃` | Existential | "There exists some..." | `∀X. Person(X) → ∃Y. MotherOf(X, Y)` |
---
## Connectives
| Symbol | Name | Meaning |
|:-------|:-----|:--------|
| `∧` | Conjunction | AND |
| `` | Disjunction | OR |
| `¬` | Negation | NOT |
| `→` | Implication | IF...THEN |
| `↔` | Biconditional | IF AND ONLY IF |
---
## Key Concepts
### Substitution
A mapping from variables to terms. Written as `σ`:
```
σ = { X → Alice, Y → Engineering }
Employee(X, Y) under σ = Employee(Alice, Engineering)
```
### Homomorphism
A substitution that maps one set of atoms into another, preserving structure. Used in the chase to check whether a rule head is already satisfied in the database.
### Ground Instance
An atom or rule with all variables replaced by constants — no variables remaining.
### Herbrand Universe
The set of all ground terms constructable from the constants and function symbols in a formula. Represents the "world" of all possible values.
### Skolem Term / Skolem Function
A named placeholder for an existentially quantified variable, constructed deterministically from the values that triggered it:
```
∀X. Person(X) → ∃Y. MotherOf(X, Y)
Skolemized: MotherOf(X, mother_of(X))
```
The Skolem term `mother_of(X)` means "the mother of X, whoever that is." Same input always produces the same term.
---
## Types of Dependencies (Rules)
| Type | Abbreviation | Form | Meaning |
|:-----|:-------------|:-----|:--------|
| Tuple-generating dependency | TGD | `body → ∃z. head` | If body holds, some new tuple must exist |
| Equality-generating dependency | EGD | `body → x = y` | If body holds, two values must be equal |
| Functional dependency | FD | Special EGD | A set of attributes uniquely determines another |
| Full dependency | — | TGD with no existentials | Head variables all appear in body |
---
## Rule Classes and Termination
| Rule Class | Existentials | Termination Guarantee | Notes |
|:-----------|:-------------|:----------------------|:------|
| Datalog | No | Always | Core of logic programming |
| Weakly acyclic | Yes | Yes | No cyclic value propagation through existentials |
| Guarded | Yes | No (but decidable) | Existential vars "guarded" by a body atom |
| Frontier-one | Yes | No (but decidable) | At most one frontier variable per rule |
| General TGDs | Yes | Undecidable | No restrictions |
---
## Datalog
Datalog is a logic programming language and the most important rule class for the chase. It is a restriction of first-order logic with no function symbols and no existential variables in rule heads.
### Syntax
A Datalog program consists of:
- **Facts** — ground atoms representing known data
- **Rules** — if-then statements deriving new facts from existing ones
```
% Facts
Employee(alice, engineering).
Employee(bob, marketing).
Department(engineering).
% Rules
WorksIn(X, D) :- Employee(X, D), Department(D).
Colleague(X, Y) :- WorksIn(X, D), WorksIn(Y, D), X ≠ Y.
```
### Key Properties
| Property | Value |
|:---------|:------|
| Existentials in head | No |
| Function symbols | No |
| Negation | Stratified only |
| Termination | Always |
| Data complexity | PTIME |
| Evaluation strategy | Bottom-up (forward chaining) or top-down (backward chaining) |
### Evaluation
Datalog is evaluated by computing the **least fixed point** — repeatedly applying all rules until no new facts are derived. This is exactly the chase with no existentials:
```
Iteration 0: { Employee(alice, engineering), Employee(bob, marketing), Department(engineering) }
Iteration 1: + WorksIn(alice, engineering)
Iteration 2: + Colleague(alice, bob), Colleague(bob, alice)
Iteration 3: nothing new → stop
```
### Datalog vs SQL
Datalog can express **recursive queries** naturally, which SQL cannot without special extensions (like recursive CTEs). For example, computing transitive closure (all ancestors of a person) is trivial in Datalog:
```
Ancestor(X, Y) :- Parent(X, Y).
Ancestor(X, Y) :- Parent(X, Z), Ancestor(Z, Y).
```
### Datalog and the Chase
Pure Datalog evaluation is a restricted chase with no existential variables — all chase variants behave identically. The fixed point is always reached in at most O(n^k) steps.
---
## Geometric Logic
Geometric logic is a fragment of first-order logic that is particularly well-suited to the chase. It is the theoretical foundation for most modern chase-based reasoning engines.
### Definition
A formula is **geometric** if it uses only:
| Allowed | Not Allowed |
|:--------|:-----------|
| Conjunction `∧` | Universal quantifier `∀` in head |
| Disjunction `` | Negation `¬` |
| Existential quantifier `∃` | Implication `→` in head |
| Equality `=` | Infinite conjunctions in head |
A **geometric sequent** (rule) has the form:
```
φ ⊢ ψ
where φ and ψ are geometric formulas
```
Which is shorthand for: `∀x. φ(x) → ψ(x)`
### Examples
```
-- Every employee works in some department
Employee(X) ⊢ ∃Y. WorksIn(X, Y)
-- Every department has a manager or is a sub-department
Department(X) ⊢ (∃Y. ManagedBy(X, Y)) (∃Z. SubDeptOf(X, Z))
-- If two things are equal and one is an employee, so is the other
Employee(X) ∧ X = Y ⊢ Employee(Y)
```
### Why Geometric Logic Matters
Geometric logic has a special property: if a geometric formula is true in a model, it remains true in any **extension** of that model (adding more facts never makes it false). This is called **monotonicity** and is exactly what makes the chase work — the chase only adds facts, never removes them.
This property means:
- The chase directly computes models of geometric theories
- Every geometric theory has a **canonical model** — the one the chase builds
- Query answers over the chase result are guaranteed correct for any model of the theory
### Geometric Logic vs Other Fragments
```
First-Order Logic
└── Geometric Logic (∧, , ∃ only — monotone)
└── Coherent Logic (same but finitary — finite disjunctions only)
└── Datalog (no existentials in head, no disjunction)
└── Horn Clauses (at most one atom in head)
```
### Connection to the Chase
Geometric sequents map directly onto TGDs (tuple-generating dependencies):
```
Geometric sequent: Employee(X) ⊢ ∃Y. WorksIn(X, Y)
TGD: Employee(X) → ∃Y. WorksIn(X, Y)
```
The chase is the standard procedure for building models of geometric theories. Given a set of geometric axioms and an initial database, the chase constructs the minimal model satisfying all axioms — which is exactly the universal solution used for query answering.
---
## Logic Hierarchy
```
Propositional Logic -- no variables, just true/false statements
↓ (add variables and quantifiers)
First-Order Logic (FOL) -- variables, predicates, ∀, ∃
↓ (restrict to certain rule forms)
Geometric Logic -- only ∧, , ∃ — no ∀ in head, no ¬
↓ (remove existentials from head)
Datalog -- function-free, no existentials in head
↓ (remove disjunction)
Horn Clauses -- at most one atom in head
```
---
## Changelog
* **Mar 6, 2026** -- First version created.

View File

@ -0,0 +1,119 @@
## Datomic as a Backend for Geolog
### Short Answer
Datomic could plausibly serve as the persistence layer for Geolog, but not as the full logic/runtime backend.
It matches Geolog's desire for:
* Persistent, append-only storage with history
* Interactive mutation over evolving instances
* Datalog-style querying over stored facts
It does **not** by itself provide the core Geolog semantics described elsewhere in these notes:
* Existential witness generation during the chase
* Disjunctive branching into multiple candidate models
* Equality saturation / term merging
* Typed theories, dependent theories, and nested instances as first-class language concepts
So the right mental model is:
* Datomic stores Geolog state
* Geolog implements its own chase/type/equality engine above Datomic
---
### Why It Fits Part of the Problem
The notes describe Geolog as infrastructure with persistent, append-only storage and an interactive REPL, not just as a one-shot solver. Datomic's immutable fact model lines up well with that storage story.
Good candidates to store in Datomic:
* theory definitions
* schemas for sorts, functions, and relations
* user-authored instance data
* derived facts and provenance
* transaction history
* saved snapshots / named worlds
This is especially attractive if Geolog wants:
* time travel / "as of" inspection
* auditability of derivations
* incremental recomputation from prior states
* a durable log of REPL activity
---
### Where Datomic Stops Being Enough
The Geolog notes are clear that the system goes beyond plain Datalog.
Geolog needs chase behavior that can:
* invent fresh witnesses for existential conclusions
* branch when a rule concludes a disjunction
* merge terms when equations force equality
Those are central to the chase-based model described in the notes, and they are not features Datomic gives you as a database. Datomic gives you a temporal fact store plus Datalog query tools; it does not give you a geometric-logic engine.
That means a Datomic-based Geolog would still need its own:
* rule compiler / evaluator
* model-branch management
* equality class machinery
* fixpoint / chase scheduler
* termination controls and bounds
Inference: Datomic is a reasonable substrate for storing models and derivation history, but it is not a substitute for the core Geolog runtime.
---
### Licensing and Stack Implications
As of March 26, 2026, Datomic's official docs say that all editions are free and that the binaries are licensed under Apache 2.0, while Datomic's own FAQ says it is not open source because the source is not published.
Official sources:
* [Datomic Overview](https://docs.datomic.com/datomic-overview.html)
* [Datomic Pro Releases](https://docs.datomic.com/releases-pro.html)
* [Datomic is Free](https://blog.datomic.com/2023/04/datomic-is-free.html)
* [Datomic Local is Released](https://blog.datomic.com/2023/08/datomic-local-is-released.html)
Practically, that means:
* You can use Datomic inside a stack whose other components are open source.
* You can redistribute Datomic binaries where the Apache-licensed binary terms allow it.
* Datomic does not force the rest of the stack to become closed-source.
* But you are still depending on a source-unavailable vendor component.
That source-unavailable dependency has real engineering consequences:
* you cannot inspect or modify Datomic internals
* you cannot self-maintain a fork if Nubank stops investing in it
* debugging deep engine behavior depends on vendor docs and black-box observation
* fully reproducible "build everything from source" workflows become weaker
* some contributors will reject the stack on principle because it is not fully open
So the licensing issue is not "can it legally coexist with open parts of the stack?" The answer there is mostly **yes**. The more important question is strategic:
* does Geolog want a fully open, self-hostable, inspectable foundation
* or is it acceptable to place the storage layer on a free but closed-source binary
If Geolog wants the strongest possible open-source story, Datomic is a poor fit. If Geolog mainly wants a strong immutable fact store and accepts vendor dependence, Datomic remains viable.
---
### Bottom Line
For Geolog as described in these notes:
* Yes for storage, history, and query support
* No for the core reasoning engine
* Legally workable in a mixed stack
* Strategically questionable if full openness is a project goal
## Changelog
* **Mar 26, 2026** -- Added note on whether Datomic could serve as a backend for Geolog, including licensing implications.