Add notes about Cale's and Felix's Datalog implementations
This commit is contained in:
parent
9ae3bc9b14
commit
654012f2ef
447
hassan/005-felixs-datalog.md
Normal file
447
hassan/005-felixs-datalog.md
Normal file
@ -0,0 +1,447 @@
|
|||||||
|
## felix-db: Project Internals
|
||||||
|
|
||||||
|
This document provides detailed technical documentation of the felix-db Datalog implementation.
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used
|
||||||
|
for database queries, program analysis, and reasoning systems.
|
||||||
|
|
||||||
|
**Project Status:** The implementation is correct and fully functional for positive Datalog (without negation in rules).
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ User Input │
|
||||||
|
│ (Datalog text/files) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ DatalogParser.hs │
|
||||||
|
│ (Megaparsec-based parser) │
|
||||||
|
│ Text → Statement (Fact | Rule | Query) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ DatalogDB.hs │
|
||||||
|
│ (Type class + core data structures) │
|
||||||
|
│ Relation, RelationRule, RuleElement │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────┴───────────────┐
|
||||||
|
▼ ▼
|
||||||
|
┌─────────────────────────┐ ┌─────────────────────────────────┐
|
||||||
|
│ InMemoryDB.hs │ │ Rules.hs │
|
||||||
|
│ (Concrete DB impl) │ │ (Rule digestion logic) │
|
||||||
|
│ relations + constants │ │ digestHead, digestBody │
|
||||||
|
└─────────────────────────┘ └─────────────────────────────────┘
|
||||||
|
│ │
|
||||||
|
└───────────────┬───────────────┘
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ NaiveQE.hs │
|
||||||
|
│ (Naive Query Engine - bottom-up) │
|
||||||
|
│ Fixed-point computation of Herbrand model │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ DigestedQuery.hs │
|
||||||
|
│ (Query preprocessing) │
|
||||||
|
│ Query → DigestedQuery with indexed variables │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Module Details
|
||||||
|
|
||||||
|
#### 1. DatalogParser.hs
|
||||||
|
|
||||||
|
**Purpose:** Parses Datalog syntax into an Abstract Syntax Tree (AST).
|
||||||
|
|
||||||
|
**Parser Type:** Uses Megaparsec (`Parsec Void Text`).
|
||||||
|
|
||||||
|
##### Core Types
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data Term
|
||||||
|
= Var Text -- Variables: X, Person, Y1 (uppercase start)
|
||||||
|
| Sym Text -- Symbols: alice, "london", uk (lowercase/quoted)
|
||||||
|
| Num Integer
|
||||||
|
|
||||||
|
data Literal = Literal
|
||||||
|
{ positive :: Bool -- True = positive, False = negated
|
||||||
|
, predName :: Text -- Predicate name
|
||||||
|
, arguments :: [Term] -- Argument list
|
||||||
|
}
|
||||||
|
|
||||||
|
data Statement
|
||||||
|
= Fact Literal -- p(a,b).
|
||||||
|
| Rule Head [Literal] -- p(X,Y) :- q(X,Z), r(Z,Y).
|
||||||
|
| Query [Text] [Literal] -- ?- p(X,Y). or ?- p(X,Y) → X,Y.
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Syntax Support
|
||||||
|
|
||||||
|
| Feature | Example | Notes |
|
||||||
|
|-------------------|-------------------------------------|--------------------------|
|
||||||
|
| Facts | `parent("alice", "bob").` | Ground atoms |
|
||||||
|
| Rules | `ancestor(X,Y) :- parent(X,Y).` | Head :- body |
|
||||||
|
| Queries | `?- parent(X,Y).` | Returns all variables |
|
||||||
|
| Projected queries | `?- edge(A,B), edge(B,C) → A,C.` | Explicit output vars |
|
||||||
|
| Negation | `not member(X,L)` or `!member(X,L)` | Parsed but not evaluated |
|
||||||
|
| Comments | `-- line` or `/* block */` | Haskell-style |
|
||||||
|
| Arrow variants | `:-`, `→`, `->` | All accepted |
|
||||||
|
|
||||||
|
##### Entry Points
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
parseDatalog :: Text -> Either (ParseErrorBundle Text Void) Statement
|
||||||
|
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 2. DatalogDB.hs
|
||||||
|
|
||||||
|
**Purpose:** Defines the core data structures and the `DatalogDB` type class.
|
||||||
|
|
||||||
|
##### Core Types
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data Relation = Relation
|
||||||
|
{ name :: RelationId -- Relation name (Text)
|
||||||
|
, arity :: Int -- Number of arguments
|
||||||
|
, tuples :: Set [Constant] -- Known facts (extensional)
|
||||||
|
, rules :: [RelationRule] -- Derivation rules (intensional)
|
||||||
|
}
|
||||||
|
|
||||||
|
data RelationRule = RelationRule
|
||||||
|
{ headVariables :: [Text] -- Variables in scope
|
||||||
|
, bodyElements :: [RuleBodyElement] -- Body constraints
|
||||||
|
, outputElements :: [RuleElement] -- How to construct head tuple
|
||||||
|
}
|
||||||
|
|
||||||
|
data RuleBodyElement = RuleBodyElement
|
||||||
|
{ subRelationId :: RelationId -- Referenced relation
|
||||||
|
, ruleElements :: [RuleElement] -- How to map variables
|
||||||
|
}
|
||||||
|
|
||||||
|
data RuleElement
|
||||||
|
= RuleElementConstant Constant -- Fixed value
|
||||||
|
| RuleElementVariable Int -- Index into headVariables
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Type Class
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
class DatalogDB db where
|
||||||
|
emptyDB :: db
|
||||||
|
relationNames :: db -> [Text]
|
||||||
|
lookupRelation :: db -> Text -> Maybe Relation
|
||||||
|
insertRelation :: db -> Relation -> db
|
||||||
|
addConstants :: db -> Set Constant -> db
|
||||||
|
allConstants :: db -> Set Constant -- Herbrand universe
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 3. InMemoryDB.hs
|
||||||
|
|
||||||
|
**Purpose:** Concrete implementation of `DatalogDB` using in-memory data structures.
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data InMemoryDB = InMemoryDB
|
||||||
|
{ relations :: Map RelationId Relation
|
||||||
|
, constants :: Set Constant -- The Herbrand universe
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Builder Functions
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
-- Build DB from facts only
|
||||||
|
withFacts :: [Text] -> InMemoryDB
|
||||||
|
|
||||||
|
-- Build DB from facts and rules
|
||||||
|
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB
|
||||||
|
|
||||||
|
-- Build DB with explicit extra constants
|
||||||
|
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
let db = withFactsAndRules
|
||||||
|
[ "parent(\"alice\", \"bob\")."
|
||||||
|
, "parent(\"bob\", \"carol\")."
|
||||||
|
]
|
||||||
|
[ "ancestor(X,Y) :- parent(X,Y)."
|
||||||
|
, "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 4. Rules.hs
|
||||||
|
|
||||||
|
**Purpose:** Processes rules and builds the internal representation.
|
||||||
|
|
||||||
|
##### Key Functions
|
||||||
|
|
||||||
|
**`digestHead`** - Processes the rule head:
|
||||||
|
|
||||||
|
1. Extracts variables using `nub` (preserves first-occurrence order)
|
||||||
|
2. Creates `RuleElement` list mapping terms to indices/constants
|
||||||
|
3. Collects constants for the Herbrand universe
|
||||||
|
|
||||||
|
**`digestBody`** - Processes each body literal:
|
||||||
|
|
||||||
|
1. Extends variable list with new variables from body
|
||||||
|
2. Creates `BodyConstraint` linking to sub-relations
|
||||||
|
3. Maps terms to `RuleElement` (index or constant)
|
||||||
|
|
||||||
|
**Variable Indexing Example:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
|
||||||
|
|
||||||
|
After digestHead: variableNames = ["X", "Y"]
|
||||||
|
After digestBody[0]: variableNames = ["X", "Y", "Z"] (Z added)
|
||||||
|
After digestBody[1]: variableNames = ["X", "Y", "Z"] (no change)
|
||||||
|
|
||||||
|
Indices: X=0, Y=1, Z=2
|
||||||
|
```
|
||||||
|
|
||||||
|
**`addRule`** - Main entry point:
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 5. NaiveQE.hs (Naive Query Engine)
|
||||||
|
|
||||||
|
**Purpose:** Implements naive bottom-up evaluation with fixed-point iteration.
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data NaiveQE db = NaiveQE
|
||||||
|
{ db :: db
|
||||||
|
, herbrand :: Map Text Relation -- Computed model
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Algorithm: `computeHerbrand`
|
||||||
|
|
||||||
|
The naive evaluation algorithm:
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Initialize Facts := all known facts from DB
|
||||||
|
2. Repeat:
|
||||||
|
a. For each rule R in each relation:
|
||||||
|
- Enumerate all variable assignments from Herbrand universe
|
||||||
|
- For each assignment that satisfies all body constraints:
|
||||||
|
- Compute the derived tuple
|
||||||
|
- If tuple is new, add to Facts and set changed=True
|
||||||
|
b. If no changes, terminate
|
||||||
|
3. Return final Facts (the minimal Herbrand model)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Complexity:** O(n^k) per iteration where n = |constants|, k = |variables in rule|
|
||||||
|
|
||||||
|
##### Query Execution
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Enumerate all assignments for sought variables
|
||||||
|
2. For each, check if any assignment to unsought variables satisfies all conditions
|
||||||
|
3. Return matching sought-variable tuples
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 6. DigestedQuery.hs
|
||||||
|
|
||||||
|
**Purpose:** Preprocesses queries for efficient execution.
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data DigestedQuery = DigestedQuery
|
||||||
|
{ allBoundVariables :: [Text] -- All variables in query
|
||||||
|
, numSoughtVariables :: Int -- Variables in output
|
||||||
|
, conditions :: [DigestedQueryCondition]
|
||||||
|
}
|
||||||
|
|
||||||
|
data DigestedQueryCondition = DigestedQueryCondition
|
||||||
|
{ relationName :: Text
|
||||||
|
, entries :: [DigestedQueryEntry]
|
||||||
|
}
|
||||||
|
|
||||||
|
data DigestedQueryEntry
|
||||||
|
= DigestedQueryEntryConstant Constant
|
||||||
|
| DigestedQueryEntryVariable Int -- Index into allBoundVariables
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query Forms:**
|
||||||
|
|
||||||
|
- `?- p(X,Y).` - Output all variables (X, Y)
|
||||||
|
- `?- edge(A,B), edge(B,C) → A,C.` - Output only A, C (B is existential)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 7. QueryEngine.hs
|
||||||
|
|
||||||
|
**Purpose:** Type class for query engines.
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
class QueryEngine qe where
|
||||||
|
queryEngine :: (DatalogDB db) => db -> qe db
|
||||||
|
query :: (DatalogDB db) => qe db -> Text -> Text
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 8. Utility.hs
|
||||||
|
|
||||||
|
**Purpose:** Helper functions.
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
-- Generate all possible maps from domain to codomain
|
||||||
|
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]
|
||||||
|
```
|
||||||
|
|
||||||
|
Used by `NaiveQE` to enumerate variable assignments.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Data Flow Example
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
parent("alice", "bob").
|
||||||
|
parent("bob", "carol").
|
||||||
|
ancestor(X,Y) :- parent(X,Y).
|
||||||
|
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
|
||||||
|
?- ancestor(alice, X).
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 1: Parse** → `[Fact, Fact, Rule, Rule, Query]`
|
||||||
|
|
||||||
|
**Step 2: Build DB**
|
||||||
|
|
||||||
|
```
|
||||||
|
relations = {
|
||||||
|
"parent" -> Relation {
|
||||||
|
tuples = {["alice","bob"], ["bob","carol"]},
|
||||||
|
rules = []
|
||||||
|
},
|
||||||
|
"ancestor" -> Relation {
|
||||||
|
tuples = {},
|
||||||
|
rules = [Rule1, Rule2]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
constants = {"alice", "bob", "carol"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: Compute Herbrand Model**
|
||||||
|
|
||||||
|
```
|
||||||
|
Iteration 1: Apply rules
|
||||||
|
ancestor(alice,bob) ← parent(alice,bob)
|
||||||
|
ancestor(bob,carol) ← parent(bob,carol)
|
||||||
|
|
||||||
|
Iteration 2: Apply rules again
|
||||||
|
ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)
|
||||||
|
|
||||||
|
Iteration 3: No new facts → terminate
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Execute Query**
|
||||||
|
|
||||||
|
```
|
||||||
|
?- ancestor(alice, X)
|
||||||
|
Result: bob, carol
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Correctness Assessment
|
||||||
|
|
||||||
|
| Component | Status | Notes |
|
||||||
|
|-----------------|---------|--------------------------------------|
|
||||||
|
| Parser | Correct | Full Datalog syntax support |
|
||||||
|
| Data Model | Correct | Proper Relation/Rule representation |
|
||||||
|
| Rule Processing | Correct | Correct variable indexing with `nub` |
|
||||||
|
| Query Engine | Correct | Fixed-point terminates correctly |
|
||||||
|
| Tests | Pass | Comprehensive coverage |
|
||||||
|
|
||||||
|
#### Limitations
|
||||||
|
|
||||||
|
1. **No stratified negation** - Negation is parsed but ignored in evaluation
|
||||||
|
2. **Naive complexity** - Exponential in number of variables per rule
|
||||||
|
3. **No aggregation** - Standard Datalog without extensions
|
||||||
|
4. **No built-in predicates** - No arithmetic comparison, etc.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Test Coverage
|
||||||
|
|
||||||
|
Tests are in `test/Test/Datalog/`:
|
||||||
|
|
||||||
|
- **DatalogParserSpec.hs** - Parser correctness
|
||||||
|
- **InMemoryDBSpec.hs** - Database operations
|
||||||
|
- **RulesSpec.hs** - Rule digestion
|
||||||
|
- **NaiveQESpec.hs** - Query evaluation
|
||||||
|
- **DigestedQuerySpec.hs** - Query preprocessing
|
||||||
|
|
||||||
|
**Notable Test Cases:**
|
||||||
|
|
||||||
|
- Reflexive relations: `equiv(X, X) :- .`
|
||||||
|
- Symmetric relations: `equiv(Y, X) :- equiv(X, Y).`
|
||||||
|
- Transitive closure: `equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).`
|
||||||
|
- Full equivalence relations (combining all three)
|
||||||
|
- Genealogical queries (niece, sibling relationships)
|
||||||
|
- Poset ordering with constants
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Build & Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cabal build ## Build
|
||||||
|
cabal test ## Run all tests
|
||||||
|
cabal repl ## Interactive REPL
|
||||||
|
|
||||||
|
## Run specific tests
|
||||||
|
cabal test --test-option=--match="/NaiveQE/"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
- **megaparsec** - Parser combinators
|
||||||
|
- **containers** - Map, Set
|
||||||
|
- **text** - Text processing
|
||||||
|
- **hspec** - Testing framework
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Future Work
|
||||||
|
|
||||||
|
Potential enhancements:
|
||||||
|
|
||||||
|
1. Implement stratified negation
|
||||||
|
2. Add semi-naive evaluation for better performance
|
||||||
|
3. Built-in predicates (comparison, arithmetic)
|
||||||
|
4. Magic sets optimization
|
||||||
|
5. Persistent storage backend
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **Mar 4, 2026** -- The first version was created.
|
||||||
357
hassan/006-cales-datalog.md
Normal file
357
hassan/006-cales-datalog.md
Normal file
@ -0,0 +1,357 @@
|
|||||||
|
## toy-datalog: Implementation Notes
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
**toy-datalog** is a Haskell implementation of a Datalog interpreter. Datalog is a declarative logic programming language that is a syntactic subset
|
||||||
|
of Prolog, commonly used for:
|
||||||
|
|
||||||
|
- Database query languages (it's the theoretical foundation for SQL's recursive queries)
|
||||||
|
- Static program analysis
|
||||||
|
- Knowledge representation and reasoning
|
||||||
|
- Access control policies
|
||||||
|
|
||||||
|
This implementation provides parsing, evaluation, and fixed-point computation for Datalog programs.
|
||||||
|
|
||||||
|
**Author:** Cale Gibbard
|
||||||
|
**License:** BSD-2-Clause
|
||||||
|
**Version:** 0.1.0.0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Datalog Program (Text) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Parser (Megaparsec) │
|
||||||
|
│ • parseTerm, parseAtom, parseRule, parseProgram │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Abstract Syntax Tree │
|
||||||
|
│ • Program = [Rule] │
|
||||||
|
│ • Rule = Atom :- [Atom] │
|
||||||
|
│ • Atom = RelId [Term] │
|
||||||
|
│ • Term = Con ConId | Var VarId │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Database │
|
||||||
|
│ • Herbrand Universe (all constants) │
|
||||||
|
│ • Relations (indexed tuples) │
|
||||||
|
│ • Rules (indexed by head relation) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Evaluation Engine │
|
||||||
|
│ • Forward chaining (bottom-up) │
|
||||||
|
│ • Fixed-point iteration │
|
||||||
|
│ • Variable binding and unification │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Derived Facts (Complete Database) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Module Analysis
|
||||||
|
|
||||||
|
#### 1. Datalog.Syntax (`src/Datalog/Syntax.hs`)
|
||||||
|
|
||||||
|
Defines the core AST types:
|
||||||
|
|
||||||
|
| Type | Description | Example |
|
||||||
|
|-----------|-----------------------------------------------|---------------------------------|
|
||||||
|
| `ConId` | Constant identifier (lowercase/numeric start) | `xerces`, `0`, `brooke` |
|
||||||
|
| `VarId` | Variable identifier (uppercase start) | `X`, `Y`, `Parent` |
|
||||||
|
| `RelId` | Relation/predicate name | `parent`, `ancestor` |
|
||||||
|
| `Term` | Either a constant or variable | `Con "xerces"`, `Var "X"` |
|
||||||
|
| `Atom` | Relation applied to terms | `parent(X, Y)` |
|
||||||
|
| `Rule` | Head atom with body atoms | `ancestor(X,Y) :- parent(X,Y).` |
|
||||||
|
| `Program` | Collection of rules | Full Datalog program |
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
|
||||||
|
- `HasConstants` typeclass for extracting constants from any Datalog structure
|
||||||
|
- `Pretty` typeclass for converting AST back to readable syntax
|
||||||
|
- Smart constructor `term` that auto-detects constants vs variables by case
|
||||||
|
|
||||||
|
#### 2. Datalog.Parser (`src/Datalog/Parser.hs`)
|
||||||
|
|
||||||
|
Megaparsec-based parser with:
|
||||||
|
|
||||||
|
- **Lexical conventions:**
|
||||||
|
- Whitespace handling (spaces, line comments `--`, block comments `{- -}`)
|
||||||
|
- Constants: start with lowercase or digit
|
||||||
|
- Variables: start with uppercase
|
||||||
|
- Relations: start with lowercase
|
||||||
|
|
||||||
|
- **Grammar:**
|
||||||
|
```
|
||||||
|
program ::= rule*
|
||||||
|
rule ::= atom '.' (fact)
|
||||||
|
| atom ':-' query '.' (rule with body)
|
||||||
|
query ::= atom (',' atom)*
|
||||||
|
atom ::= relId '(' term (',' term)* ')'
|
||||||
|
term ::= variable | constant
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Datalog.Eval (`src/Datalog/Eval.hs`)
|
||||||
|
|
||||||
|
The evaluation engine implements **naive bottom-up evaluation** (forward chaining).
|
||||||
|
|
||||||
|
##### Data Structures
|
||||||
|
|
||||||
|
**Relation:**
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data Relation = Relation
|
||||||
|
{ _relation_arity :: Int
|
||||||
|
, _relation_members :: Set [ConId] -- all tuples
|
||||||
|
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- index
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The index maps `(position, value)` pairs to tuples containing that value at that position, enabling efficient lookups.
|
||||||
|
|
||||||
|
**Database:**
|
||||||
|
|
||||||
|
```haskell
|
||||||
|
data Database = Database
|
||||||
|
{ _database_universe :: Set ConId -- Herbrand universe
|
||||||
|
, _database_relations :: Map RelId Relation
|
||||||
|
, _database_rules :: Map RelId [Rule]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Evaluation Algorithm
|
||||||
|
|
||||||
|
1. **`evalAtomDb`**: Query an atom against the database
|
||||||
|
- Uses indices to find candidate tuples
|
||||||
|
- Computes intersection of candidates for each bound position
|
||||||
|
- Returns set of variable bindings
|
||||||
|
|
||||||
|
2. **`evalConjunction`**: Evaluate a conjunction (rule body)
|
||||||
|
- Evaluates each atom
|
||||||
|
- Joins bindings using `><` operator (relational join)
|
||||||
|
|
||||||
|
3. **`immediateConsequences`**: Derive new facts from a rule
|
||||||
|
- Evaluate the body to get bindings
|
||||||
|
- Project bindings onto head atom terms
|
||||||
|
|
||||||
|
4. **`extendDb`**: Single iteration of forward chaining
|
||||||
|
- Apply all rules to derive new facts
|
||||||
|
- Add new tuples to relations
|
||||||
|
|
||||||
|
5. **`extendFixedpointDb`**: Iterate until no new facts
|
||||||
|
- Repeatedly call `extendDb` until database stops changing
|
||||||
|
|
||||||
|
#### 4. Datalog.IO (`src/Datalog/IO.hs`)
|
||||||
|
|
||||||
|
File I/O utilities:
|
||||||
|
|
||||||
|
- `readProgram`: Parse a file into a `Program`
|
||||||
|
- `readDatabase`: Parse and load into a `Database`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Correctness Analysis
|
||||||
|
|
||||||
|
#### Verified Correct Behaviors
|
||||||
|
|
||||||
|
1. **Variable binding consistency**: The `mkBinding` function in `unsafeEvalAtomRel` correctly handles the case where the same variable appears
|
||||||
|
multiple times in an atom. It checks that all occurrences bind to the same value (lines 125-130).
|
||||||
|
|
||||||
|
2. **Join semantics**: The `joinBindings` function correctly implements a consistent merge of two bindings, failing when variables are bound to
|
||||||
|
different values.
|
||||||
|
|
||||||
|
3. **Arity checking**: The implementation properly tracks and validates relation arities, preventing arity mismatches.
|
||||||
|
|
||||||
|
4. **Fixed-point termination**: Since Datalog is guaranteed to terminate (finite Herbrand universe, monotonic derivation), the fixed-point iteration
|
||||||
|
is correct.
|
||||||
|
|
||||||
|
5. **Index maintenance**: Indices are properly maintained during insertion via `insertInRelation`.
|
||||||
|
|
||||||
|
#### Potential Issues
|
||||||
|
|
||||||
|
1. **`projectTuple` with unbound variables** (lines 147-157):
|
||||||
|
|
||||||
|
When the head of a rule contains a variable not appearing in the body, `projectTuple` generates all possible values from the Herbrand universe.
|
||||||
|
This is semantically correct for Datalog (the variable ranges over all constants), but:
|
||||||
|
- It can cause a combinatorial explosion
|
||||||
|
- This is an unusual Datalog pattern (typically all head variables appear in the body)
|
||||||
|
|
||||||
|
Example: `foo(X) :- bar(Y).` would generate `foo(c)` for every constant `c` in the universe.
|
||||||
|
|
||||||
|
2. **Naive evaluation strategy**:
|
||||||
|
|
||||||
|
The implementation uses naive bottom-up evaluation, which re-derives all facts at each iteration. This is correct but inefficient compared to
|
||||||
|
semi-naive evaluation.
|
||||||
|
|
||||||
|
3. **Unsafe partial function**:
|
||||||
|
|
||||||
|
`insertManyInRelationDb` uses `error` on line 51 if the relation doesn't exist. However, this should never occur in practice due to prior arity
|
||||||
|
checks.
|
||||||
|
|
||||||
|
4. **Empty atom arguments**:
|
||||||
|
|
||||||
|
The parser allows zero-argument atoms `foo()`, and evaluation handles them correctly (they act as propositional facts).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Strengths
|
||||||
|
|
||||||
|
#### 1. Clean Haskell Design
|
||||||
|
|
||||||
|
- Proper use of newtypes for type safety (`ConId`, `VarId`, `RelId`)
|
||||||
|
- Effective use of typeclasses (`HasConstants`, `Pretty`)
|
||||||
|
- Clear separation of concerns between modules
|
||||||
|
- Appropriate use of `Either` for error handling
|
||||||
|
|
||||||
|
#### 2. Efficient Indexing
|
||||||
|
|
||||||
|
- Position-based indexing enables efficient lookups
|
||||||
|
- Intersection of index results for atoms with multiple constants
|
||||||
|
- Avoids full table scans when constants are present in queries
|
||||||
|
|
||||||
|
#### 3. Correct Semantics
|
||||||
|
|
||||||
|
- Proper handling of variable binding consistency
|
||||||
|
- Correct relational join semantics
|
||||||
|
- Sound fixed-point computation
|
||||||
|
|
||||||
|
#### 4. Good Parser Design
|
||||||
|
|
||||||
|
- Uses Megaparsec for good error messages
|
||||||
|
- Supports both Haskell-style comments (`--` and `{- -}`)
|
||||||
|
- Clean lexeme handling with whitespace consumption
|
||||||
|
|
||||||
|
#### 5. Extensible Architecture
|
||||||
|
|
||||||
|
- Easy to add new syntax constructs
|
||||||
|
- Database structure supports multiple relations
|
||||||
|
- Error types are well-defined and informative
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
#### 1. Performance Limitations
|
||||||
|
|
||||||
|
| Issue | Impact | Potential Fix |
|
||||||
|
|-----------------------|----------------------------------------------------|----------------------------------|
|
||||||
|
| Naive evaluation | Re-derives all facts each iteration | Implement semi-naive evaluation |
|
||||||
|
| No query optimization | Joins evaluated left-to-right | Add query planner |
|
||||||
|
| Full materialization | All derived facts stored | Support demand-driven evaluation |
|
||||||
|
| No stratification | Potential issues with negation (not yet supported) | Add stratification analysis |
|
||||||
|
|
||||||
|
#### 2. Missing Datalog Features
|
||||||
|
|
||||||
|
| Feature | Status | Description |
|
||||||
|
|---------------------|-----------------|------------------------------|
|
||||||
|
| Negation | Not supported | `not p(X)` in rule bodies |
|
||||||
|
| Aggregation | Not supported | `count`, `sum`, `min`, `max` |
|
||||||
|
| Built-in predicates | Not supported | `X < Y`, `X = Y + 1` |
|
||||||
|
| Stratification | Not applicable | Would be needed for negation |
|
||||||
|
| Magic sets | Not implemented | Query optimization technique |
|
||||||
|
| Tabling/Memoization | Basic only | Via relation storage |
|
||||||
|
|
||||||
|
#### 3. Usability Limitations
|
||||||
|
|
||||||
|
- **No REPL**: No interactive query interface
|
||||||
|
- **No CLI executable**: Executable is commented out in cabal file
|
||||||
|
- **Limited error messages**: Parsing errors are good (Megaparsec), but evaluation errors are basic
|
||||||
|
- **No source locations**: Error messages don't include line numbers for semantic errors
|
||||||
|
- **No pretty-printed output**: Query results shown as Haskell data structures
|
||||||
|
|
||||||
|
#### 4. Safety/Edge Cases
|
||||||
|
|
||||||
|
- **Unsafe head variables**: Variables in rule heads not appearing in bodies cause universe enumeration
|
||||||
|
- **No recursion limit**: Deeply recursive programs could stack overflow
|
||||||
|
- **No fact limit**: Large derived fact sets could exhaust memory
|
||||||
|
|
||||||
|
#### 5. Testing Gaps
|
||||||
|
|
||||||
|
- Only parser golden tests are active
|
||||||
|
- Database and fixed-point tests are commented out
|
||||||
|
- No property-based testing
|
||||||
|
- No negative test cases (expected failures)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Example Programs
|
||||||
|
|
||||||
|
#### Transitive Closure (ancestor.dl)
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
parent(xerces, brooke).
|
||||||
|
parent(brooke, damocles).
|
||||||
|
|
||||||
|
ancestor(X, Y) :- parent(X, Y).
|
||||||
|
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).
|
||||||
|
```
|
||||||
|
|
||||||
|
Derives: `ancestor(xerces, brooke)`, `ancestor(brooke, damocles)`, `ancestor(xerces, damocles)`
|
||||||
|
|
||||||
|
#### Symmetric Closure with Parity (graph.dl)
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
odd(X,Y) :- r(X,Y).
|
||||||
|
odd(X,Y) :- even(X,Z), r(Z,Y).
|
||||||
|
even(X,Y) :- odd(X,Z), r(Z,Y).
|
||||||
|
|
||||||
|
r(0,1). r(1,2). r(2,3). r(3,4). r(4,5).
|
||||||
|
r(X,Y) :- r(Y,X).
|
||||||
|
```
|
||||||
|
|
||||||
|
Computes paths of odd/even length in a bidirectional graph.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Recommendations for Future Development
|
||||||
|
|
||||||
|
#### High Priority
|
||||||
|
|
||||||
|
1. **Enable the executable** with a simple REPL or query interface
|
||||||
|
2. **Implement semi-naive evaluation** for significant performance improvement
|
||||||
|
3. **Add more tests** including evaluation tests and property tests
|
||||||
|
4. **Add source locations** to AST for better error messages
|
||||||
|
|
||||||
|
#### Medium Priority
|
||||||
|
|
||||||
|
5. **Support negation** with stratification checking
|
||||||
|
6. **Add built-in predicates** for arithmetic comparisons
|
||||||
|
7. **Implement query optimization** (join reordering)
|
||||||
|
8. **Add aggregation** support
|
||||||
|
|
||||||
|
#### Nice to Have
|
||||||
|
|
||||||
|
9. **Magic sets transformation** for goal-directed evaluation
|
||||||
|
10. **Incremental evaluation** for database updates
|
||||||
|
11. **Parallel evaluation** of independent rules
|
||||||
|
12. **External data sources** (CSV, databases)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Conclusion
|
||||||
|
|
||||||
|
toy-datalog is a well-structured, pedagogically clear implementation of core Datalog semantics. It correctly implements the fundamental operations of
|
||||||
|
bottom-up evaluation with proper variable binding and fixed-point computation. While it lacks advanced features like negation, aggregation, and query
|
||||||
|
optimization, it provides a solid foundation for learning about Datalog and could be extended into a more full-featured implementation.
|
||||||
|
|
||||||
|
The code quality is high, with good use of Haskell idioms and proper error handling. The main limitations are performance (naive evaluation) and
|
||||||
|
missing features rather than correctness issues.
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **Mar 4, 2026** -- The first version was created.
|
||||||
Loading…
x
Reference in New Issue
Block a user