Add notes about Cale's and Felix's Datalog implementations
This commit is contained in:
parent
9ae3bc9b14
commit
654012f2ef
447
hassan/005-felixs-datalog.md
Normal file
447
hassan/005-felixs-datalog.md
Normal file
@ -0,0 +1,447 @@
|
||||
## felix-db: Project Internals
|
||||
|
||||
This document provides detailed technical documentation of the felix-db Datalog implementation.
|
||||
|
||||
### Overview
|
||||
|
||||
felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used
|
||||
for database queries, program analysis, and reasoning systems.
|
||||
|
||||
**Project Status:** The implementation is correct and fully functional for positive Datalog (without negation in rules).
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ User Input │
|
||||
│ (Datalog text/files) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DatalogParser.hs │
|
||||
│ (Megaparsec-based parser) │
|
||||
│ Text → Statement (Fact | Rule | Query) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DatalogDB.hs │
|
||||
│ (Type class + core data structures) │
|
||||
│ Relation, RelationRule, RuleElement │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────────┴───────────────┐
|
||||
▼ ▼
|
||||
┌─────────────────────────┐ ┌─────────────────────────────────┐
|
||||
│ InMemoryDB.hs │ │ Rules.hs │
|
||||
│ (Concrete DB impl) │ │ (Rule digestion logic) │
|
||||
│ relations + constants │ │ digestHead, digestBody │
|
||||
└─────────────────────────┘ └─────────────────────────────────┘
|
||||
│ │
|
||||
└───────────────┬───────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ NaiveQE.hs │
|
||||
│ (Naive Query Engine - bottom-up) │
|
||||
│ Fixed-point computation of Herbrand model │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DigestedQuery.hs │
|
||||
│ (Query preprocessing) │
|
||||
│ Query → DigestedQuery with indexed variables │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Module Details
|
||||
|
||||
#### 1. DatalogParser.hs
|
||||
|
||||
**Purpose:** Parses Datalog syntax into an Abstract Syntax Tree (AST).
|
||||
|
||||
**Parser Type:** Uses Megaparsec (`Parsec Void Text`).
|
||||
|
||||
##### Core Types
|
||||
|
||||
```haskell
|
||||
data Term
|
||||
= Var Text -- Variables: X, Person, Y1 (uppercase start)
|
||||
| Sym Text -- Symbols: alice, "london", uk (lowercase/quoted)
|
||||
| Num Integer
|
||||
|
||||
data Literal = Literal
|
||||
{ positive :: Bool -- True = positive, False = negated
|
||||
, predName :: Text -- Predicate name
|
||||
, arguments :: [Term] -- Argument list
|
||||
}
|
||||
|
||||
data Statement
|
||||
= Fact Literal -- p(a,b).
|
||||
| Rule Head [Literal] -- p(X,Y) :- q(X,Z), r(Z,Y).
|
||||
| Query [Text] [Literal] -- ?- p(X,Y). or ?- p(X,Y) → X,Y.
|
||||
```
|
||||
|
||||
##### Syntax Support
|
||||
|
||||
| Feature | Example | Notes |
|
||||
|-------------------|-------------------------------------|--------------------------|
|
||||
| Facts | `parent("alice", "bob").` | Ground atoms |
|
||||
| Rules | `ancestor(X,Y) :- parent(X,Y).` | Head :- body |
|
||||
| Queries | `?- parent(X,Y).` | Returns all variables |
|
||||
| Projected queries | `?- edge(A,B), edge(B,C) → A,C.` | Explicit output vars |
|
||||
| Negation | `not member(X,L)` or `!member(X,L)` | Parsed but not evaluated |
|
||||
| Comments | `-- line` or `/* block */` | Haskell-style |
|
||||
| Arrow variants | `:-`, `→`, `->` | All accepted |
|
||||
|
||||
##### Entry Points
|
||||
|
||||
```haskell
|
||||
parseDatalog :: Text -> Either (ParseErrorBundle Text Void) Statement
|
||||
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 2. DatalogDB.hs
|
||||
|
||||
**Purpose:** Defines the core data structures and the `DatalogDB` type class.
|
||||
|
||||
##### Core Types
|
||||
|
||||
```haskell
|
||||
data Relation = Relation
|
||||
{ name :: RelationId -- Relation name (Text)
|
||||
, arity :: Int -- Number of arguments
|
||||
, tuples :: Set [Constant] -- Known facts (extensional)
|
||||
, rules :: [RelationRule] -- Derivation rules (intensional)
|
||||
}
|
||||
|
||||
data RelationRule = RelationRule
|
||||
{ headVariables :: [Text] -- Variables in scope
|
||||
, bodyElements :: [RuleBodyElement] -- Body constraints
|
||||
, outputElements :: [RuleElement] -- How to construct head tuple
|
||||
}
|
||||
|
||||
data RuleBodyElement = RuleBodyElement
|
||||
{ subRelationId :: RelationId -- Referenced relation
|
||||
, ruleElements :: [RuleElement] -- How to map variables
|
||||
}
|
||||
|
||||
data RuleElement
|
||||
= RuleElementConstant Constant -- Fixed value
|
||||
| RuleElementVariable Int -- Index into headVariables
|
||||
```
|
||||
|
||||
##### Type Class
|
||||
|
||||
```haskell
|
||||
class DatalogDB db where
|
||||
emptyDB :: db
|
||||
relationNames :: db -> [Text]
|
||||
lookupRelation :: db -> Text -> Maybe Relation
|
||||
insertRelation :: db -> Relation -> db
|
||||
addConstants :: db -> Set Constant -> db
|
||||
allConstants :: db -> Set Constant -- Herbrand universe
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 3. InMemoryDB.hs
|
||||
|
||||
**Purpose:** Concrete implementation of `DatalogDB` using in-memory data structures.
|
||||
|
||||
```haskell
|
||||
data InMemoryDB = InMemoryDB
|
||||
{ relations :: Map RelationId Relation
|
||||
, constants :: Set Constant -- The Herbrand universe
|
||||
}
|
||||
```
|
||||
|
||||
##### Builder Functions
|
||||
|
||||
```haskell
|
||||
-- Build DB from facts only
|
||||
withFacts :: [Text] -> InMemoryDB
|
||||
|
||||
-- Build DB from facts and rules
|
||||
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB
|
||||
|
||||
-- Build DB with explicit extra constants
|
||||
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```haskell
|
||||
let db = withFactsAndRules
|
||||
[ "parent(\"alice\", \"bob\")."
|
||||
, "parent(\"bob\", \"carol\")."
|
||||
]
|
||||
[ "ancestor(X,Y) :- parent(X,Y)."
|
||||
, "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 4. Rules.hs
|
||||
|
||||
**Purpose:** Processes rules and builds the internal representation.
|
||||
|
||||
##### Key Functions
|
||||
|
||||
**`digestHead`** - Processes the rule head:
|
||||
|
||||
1. Extracts variables using `nub` (preserves first-occurrence order)
|
||||
2. Creates `RuleElement` list mapping terms to indices/constants
|
||||
3. Collects constants for the Herbrand universe
|
||||
|
||||
**`digestBody`** - Processes each body literal:
|
||||
|
||||
1. Extends variable list with new variables from body
|
||||
2. Creates `BodyConstraint` linking to sub-relations
|
||||
3. Maps terms to `RuleElement` (index or constant)
|
||||
|
||||
**Variable Indexing Example:**
|
||||
|
||||
```
|
||||
Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
|
||||
|
||||
After digestHead: variableNames = ["X", "Y"]
|
||||
After digestBody[0]: variableNames = ["X", "Y", "Z"] (Z added)
|
||||
After digestBody[1]: variableNames = ["X", "Y", "Z"] (no change)
|
||||
|
||||
Indices: X=0, Y=1, Z=2
|
||||
```
|
||||
|
||||
**`addRule`** - Main entry point:
|
||||
|
||||
```haskell
|
||||
addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 5. NaiveQE.hs (Naive Query Engine)
|
||||
|
||||
**Purpose:** Implements naive bottom-up evaluation with fixed-point iteration.
|
||||
|
||||
```haskell
|
||||
data NaiveQE db = NaiveQE
|
||||
{ db :: db
|
||||
, herbrand :: Map Text Relation -- Computed model
|
||||
}
|
||||
```
|
||||
|
||||
##### Algorithm: `computeHerbrand`
|
||||
|
||||
The naive evaluation algorithm:
|
||||
|
||||
```
|
||||
1. Initialize Facts := all known facts from DB
|
||||
2. Repeat:
|
||||
a. For each rule R in each relation:
|
||||
- Enumerate all variable assignments from Herbrand universe
|
||||
- For each assignment that satisfies all body constraints:
|
||||
- Compute the derived tuple
|
||||
- If tuple is new, add to Facts and set changed=True
|
||||
b. If no changes, terminate
|
||||
3. Return final Facts (the minimal Herbrand model)
|
||||
```
|
||||
|
||||
**Complexity:** O(n^k) per iteration where n = |constants|, k = |variables in rule|
|
||||
|
||||
##### Query Execution
|
||||
|
||||
```haskell
|
||||
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
|
||||
```
|
||||
|
||||
1. Enumerate all assignments for sought variables
|
||||
2. For each, check if any assignment to unsought variables satisfies all conditions
|
||||
3. Return matching sought-variable tuples
|
||||
|
||||
---
|
||||
|
||||
#### 6. DigestedQuery.hs
|
||||
|
||||
**Purpose:** Preprocesses queries for efficient execution.
|
||||
|
||||
```haskell
|
||||
data DigestedQuery = DigestedQuery
|
||||
{ allBoundVariables :: [Text] -- All variables in query
|
||||
, numSoughtVariables :: Int -- Variables in output
|
||||
, conditions :: [DigestedQueryCondition]
|
||||
}
|
||||
|
||||
data DigestedQueryCondition = DigestedQueryCondition
|
||||
{ relationName :: Text
|
||||
, entries :: [DigestedQueryEntry]
|
||||
}
|
||||
|
||||
data DigestedQueryEntry
|
||||
= DigestedQueryEntryConstant Constant
|
||||
| DigestedQueryEntryVariable Int -- Index into allBoundVariables
|
||||
```
|
||||
|
||||
**Query Forms:**
|
||||
|
||||
- `?- p(X,Y).` - Output all variables (X, Y)
|
||||
- `?- edge(A,B), edge(B,C) → A,C.` - Output only A, C (B is existential)
|
||||
|
||||
---
|
||||
|
||||
#### 7. QueryEngine.hs
|
||||
|
||||
**Purpose:** Type class for query engines.
|
||||
|
||||
```haskell
|
||||
class QueryEngine qe where
|
||||
queryEngine :: (DatalogDB db) => db -> qe db
|
||||
query :: (DatalogDB db) => qe db -> Text -> Text
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 8. Utility.hs
|
||||
|
||||
**Purpose:** Helper functions.
|
||||
|
||||
```haskell
|
||||
-- Generate all possible maps from domain to codomain
|
||||
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]
|
||||
```
|
||||
|
||||
Used by `NaiveQE` to enumerate variable assignments.
|
||||
|
||||
---
|
||||
|
||||
### Data Flow Example
|
||||
|
||||
**Input:**
|
||||
|
||||
```datalog
|
||||
parent("alice", "bob").
|
||||
parent("bob", "carol").
|
||||
ancestor(X,Y) :- parent(X,Y).
|
||||
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
|
||||
?- ancestor(alice, X).
|
||||
```
|
||||
|
||||
**Step 1: Parse** → `[Fact, Fact, Rule, Rule, Query]`
|
||||
|
||||
**Step 2: Build DB**
|
||||
|
||||
```
|
||||
relations = {
|
||||
"parent" -> Relation {
|
||||
tuples = {["alice","bob"], ["bob","carol"]},
|
||||
rules = []
|
||||
},
|
||||
"ancestor" -> Relation {
|
||||
tuples = {},
|
||||
rules = [Rule1, Rule2]
|
||||
}
|
||||
}
|
||||
constants = {"alice", "bob", "carol"}
|
||||
```
|
||||
|
||||
**Step 3: Compute Herbrand Model**
|
||||
|
||||
```
|
||||
Iteration 1: Apply rules
|
||||
ancestor(alice,bob) ← parent(alice,bob)
|
||||
ancestor(bob,carol) ← parent(bob,carol)
|
||||
|
||||
Iteration 2: Apply rules again
|
||||
ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)
|
||||
|
||||
Iteration 3: No new facts → terminate
|
||||
```
|
||||
|
||||
**Step 4: Execute Query**
|
||||
|
||||
```
|
||||
?- ancestor(alice, X)
|
||||
Result: bob, carol
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Correctness Assessment
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------------|---------|--------------------------------------|
|
||||
| Parser | Correct | Full Datalog syntax support |
|
||||
| Data Model | Correct | Proper Relation/Rule representation |
|
||||
| Rule Processing | Correct | Correct variable indexing with `nub` |
|
||||
| Query Engine | Correct | Fixed-point terminates correctly |
|
||||
| Tests | Pass | Comprehensive coverage |
|
||||
|
||||
#### Limitations
|
||||
|
||||
1. **No stratified negation** - Negation is parsed but ignored in evaluation
|
||||
2. **Naive complexity** - Exponential in number of variables per rule
|
||||
3. **No aggregation** - Standard Datalog without extensions
|
||||
4. **No built-in predicates** - No arithmetic comparison, etc.
|
||||
|
||||
---
|
||||
|
||||
### Test Coverage
|
||||
|
||||
Tests are in `test/Test/Datalog/`:
|
||||
|
||||
- **DatalogParserSpec.hs** - Parser correctness
|
||||
- **InMemoryDBSpec.hs** - Database operations
|
||||
- **RulesSpec.hs** - Rule digestion
|
||||
- **NaiveQESpec.hs** - Query evaluation
|
||||
- **DigestedQuerySpec.hs** - Query preprocessing
|
||||
|
||||
**Notable Test Cases:**
|
||||
|
||||
- Reflexive relations: `equiv(X, X) :- .`
|
||||
- Symmetric relations: `equiv(Y, X) :- equiv(X, Y).`
|
||||
- Transitive closure: `equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).`
|
||||
- Full equivalence relations (combining all three)
|
||||
- Genealogical queries (niece, sibling relationships)
|
||||
- Poset ordering with constants
|
||||
|
||||
---
|
||||
|
||||
### Build & Run
|
||||
|
||||
```bash
|
||||
cabal build ## Build
|
||||
cabal test ## Run all tests
|
||||
cabal repl ## Interactive REPL
|
||||
|
||||
## Run specific tests
|
||||
cabal test --test-option=--match="/NaiveQE/"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Dependencies
|
||||
|
||||
- **megaparsec** - Parser combinators
|
||||
- **containers** - Map, Set
|
||||
- **text** - Text processing
|
||||
- **hspec** - Testing framework
|
||||
|
||||
---
|
||||
|
||||
### Future Work
|
||||
|
||||
Potential enhancements:
|
||||
|
||||
1. Implement stratified negation
|
||||
2. Add semi-naive evaluation for better performance
|
||||
3. Built-in predicates (comparison, arithmetic)
|
||||
4. Magic sets optimization
|
||||
5. Persistent storage backend
|
||||
|
||||
## Changelog
|
||||
|
||||
* **Mar 4, 2026** -- The first version was created.
|
||||
357
hassan/006-cales-datalog.md
Normal file
357
hassan/006-cales-datalog.md
Normal file
@ -0,0 +1,357 @@
|
||||
## toy-datalog: Implementation Notes
|
||||
|
||||
### Overview
|
||||
|
||||
**toy-datalog** is a Haskell implementation of a Datalog interpreter. Datalog is a declarative logic programming language that is a syntactic subset
|
||||
of Prolog, commonly used for:
|
||||
|
||||
- Database query languages (it's the theoretical foundation for SQL's recursive queries)
|
||||
- Static program analysis
|
||||
- Knowledge representation and reasoning
|
||||
- Access control policies
|
||||
|
||||
This implementation provides parsing, evaluation, and fixed-point computation for Datalog programs.
|
||||
|
||||
**Author:** Cale Gibbard
|
||||
**License:** BSD-2-Clause
|
||||
**Version:** 0.1.0.0
|
||||
|
||||
---
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Datalog Program (Text) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Parser (Megaparsec) │
|
||||
│ • parseTerm, parseAtom, parseRule, parseProgram │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Abstract Syntax Tree │
|
||||
│ • Program = [Rule] │
|
||||
│ • Rule = Atom :- [Atom] │
|
||||
│ • Atom = RelId [Term] │
|
||||
│ • Term = Con ConId | Var VarId │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Database │
|
||||
│ • Herbrand Universe (all constants) │
|
||||
│ • Relations (indexed tuples) │
|
||||
│ • Rules (indexed by head relation) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Evaluation Engine │
|
||||
│ • Forward chaining (bottom-up) │
|
||||
│ • Fixed-point iteration │
|
||||
│ • Variable binding and unification │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Derived Facts (Complete Database) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module Analysis
|
||||
|
||||
#### 1. Datalog.Syntax (`src/Datalog/Syntax.hs`)
|
||||
|
||||
Defines the core AST types:
|
||||
|
||||
| Type | Description | Example |
|
||||
|-----------|-----------------------------------------------|---------------------------------|
|
||||
| `ConId` | Constant identifier (lowercase/numeric start) | `xerces`, `0`, `brooke` |
|
||||
| `VarId` | Variable identifier (uppercase start) | `X`, `Y`, `Parent` |
|
||||
| `RelId` | Relation/predicate name | `parent`, `ancestor` |
|
||||
| `Term` | Either a constant or variable | `Con "xerces"`, `Var "X"` |
|
||||
| `Atom` | Relation applied to terms | `parent(X, Y)` |
|
||||
| `Rule` | Head atom with body atoms | `ancestor(X,Y) :- parent(X,Y).` |
|
||||
| `Program` | Collection of rules | Full Datalog program |
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- `HasConstants` typeclass for extracting constants from any Datalog structure
|
||||
- `Pretty` typeclass for converting AST back to readable syntax
|
||||
- Smart constructor `term` that auto-detects constants vs variables by case
|
||||
|
||||
#### 2. Datalog.Parser (`src/Datalog/Parser.hs`)
|
||||
|
||||
Megaparsec-based parser with:
|
||||
|
||||
- **Lexical conventions:**
|
||||
- Whitespace handling (spaces, line comments `--`, block comments `{- -}`)
|
||||
- Constants: start with lowercase or digit
|
||||
- Variables: start with uppercase
|
||||
- Relations: start with lowercase
|
||||
|
||||
- **Grammar:**
|
||||
```
|
||||
program ::= rule*
|
||||
rule ::= atom '.' (fact)
|
||||
| atom ':-' query '.' (rule with body)
|
||||
query ::= atom (',' atom)*
|
||||
atom ::= relId '(' term (',' term)* ')'
|
||||
term ::= variable | constant
|
||||
```
|
||||
|
||||
#### 3. Datalog.Eval (`src/Datalog/Eval.hs`)
|
||||
|
||||
The evaluation engine implements **naive bottom-up evaluation** (forward chaining).
|
||||
|
||||
##### Data Structures
|
||||
|
||||
**Relation:**
|
||||
|
||||
```haskell
|
||||
data Relation = Relation
|
||||
{ _relation_arity :: Int
|
||||
, _relation_members :: Set [ConId] -- all tuples
|
||||
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- index
|
||||
}
|
||||
```
|
||||
|
||||
The index maps `(position, value)` pairs to tuples containing that value at that position, enabling efficient lookups.
|
||||
|
||||
**Database:**
|
||||
|
||||
```haskell
|
||||
data Database = Database
|
||||
{ _database_universe :: Set ConId -- Herbrand universe
|
||||
, _database_relations :: Map RelId Relation
|
||||
, _database_rules :: Map RelId [Rule]
|
||||
}
|
||||
```
|
||||
|
||||
##### Evaluation Algorithm
|
||||
|
||||
1. **`evalAtomDb`**: Query an atom against the database
|
||||
- Uses indices to find candidate tuples
|
||||
- Computes intersection of candidates for each bound position
|
||||
- Returns set of variable bindings
|
||||
|
||||
2. **`evalConjunction`**: Evaluate a conjunction (rule body)
|
||||
- Evaluates each atom
|
||||
- Joins bindings using `><` operator (relational join)
|
||||
|
||||
3. **`immediateConsequences`**: Derive new facts from a rule
|
||||
- Evaluate the body to get bindings
|
||||
- Project bindings onto head atom terms
|
||||
|
||||
4. **`extendDb`**: Single iteration of forward chaining
|
||||
- Apply all rules to derive new facts
|
||||
- Add new tuples to relations
|
||||
|
||||
5. **`extendFixedpointDb`**: Iterate until no new facts
|
||||
- Repeatedly call `extendDb` until database stops changing
|
||||
|
||||
#### 4. Datalog.IO (`src/Datalog/IO.hs`)
|
||||
|
||||
File I/O utilities:
|
||||
|
||||
- `readProgram`: Parse a file into a `Program`
|
||||
- `readDatabase`: Parse and load into a `Database`
|
||||
|
||||
---
|
||||
|
||||
### Correctness Analysis
|
||||
|
||||
#### Verified Correct Behaviors
|
||||
|
||||
1. **Variable binding consistency**: The `mkBinding` function in `unsafeEvalAtomRel` correctly handles the case where the same variable appears
|
||||
multiple times in an atom. It checks that all occurrences bind to the same value (lines 125-130).
|
||||
|
||||
2. **Join semantics**: The `joinBindings` function correctly implements a consistent merge of two bindings, failing when variables are bound to
|
||||
different values.
|
||||
|
||||
3. **Arity checking**: The implementation properly tracks and validates relation arities, preventing arity mismatches.
|
||||
|
||||
4. **Fixed-point termination**: Since Datalog is guaranteed to terminate (finite Herbrand universe, monotonic derivation), the fixed-point iteration
|
||||
is correct.
|
||||
|
||||
5. **Index maintenance**: Indices are properly maintained during insertion via `insertInRelation`.
|
||||
|
||||
#### Potential Issues
|
||||
|
||||
1. **`projectTuple` with unbound variables** (lines 147-157):
|
||||
|
||||
When the head of a rule contains a variable not appearing in the body, `projectTuple` generates all possible values from the Herbrand universe.
|
||||
This is semantically correct for Datalog (the variable ranges over all constants), but:
|
||||
- It can cause a combinatorial explosion
|
||||
- This is an unusual Datalog pattern (typically all head variables appear in the body)
|
||||
|
||||
Example: `foo(X) :- bar(Y).` would generate `foo(c)` for every constant `c` in the universe.
|
||||
|
||||
2. **Naive evaluation strategy**:
|
||||
|
||||
The implementation uses naive bottom-up evaluation, which re-derives all facts at each iteration. This is correct but inefficient compared to
|
||||
semi-naive evaluation.
|
||||
|
||||
3. **Unsafe partial function**:
|
||||
|
||||
`insertManyInRelationDb` uses `error` on line 51 if the relation doesn't exist. However, this should never occur in practice due to prior arity
|
||||
checks.
|
||||
|
||||
4. **Empty atom arguments**:
|
||||
|
||||
The parser allows zero-argument atoms `foo()`, and evaluation handles them correctly (they act as propositional facts).
|
||||
|
||||
---
|
||||
|
||||
### Strengths
|
||||
|
||||
#### 1. Clean Haskell Design
|
||||
|
||||
- Proper use of newtypes for type safety (`ConId`, `VarId`, `RelId`)
|
||||
- Effective use of typeclasses (`HasConstants`, `Pretty`)
|
||||
- Clear separation of concerns between modules
|
||||
- Appropriate use of `Either` for error handling
|
||||
|
||||
#### 2. Efficient Indexing
|
||||
|
||||
- Position-based indexing enables efficient lookups
|
||||
- Intersection of index results for atoms with multiple constants
|
||||
- Avoids full table scans when constants are present in queries
|
||||
|
||||
#### 3. Correct Semantics
|
||||
|
||||
- Proper handling of variable binding consistency
|
||||
- Correct relational join semantics
|
||||
- Sound fixed-point computation
|
||||
|
||||
#### 4. Good Parser Design
|
||||
|
||||
- Uses Megaparsec for good error messages
|
||||
- Supports both Haskell-style comments (`--` and `{- -}`)
|
||||
- Clean lexeme handling with whitespace consumption
|
||||
|
||||
#### 5. Extensible Architecture
|
||||
|
||||
- Easy to add new syntax constructs
|
||||
- Database structure supports multiple relations
|
||||
- Error types are well-defined and informative
|
||||
|
||||
---
|
||||
|
||||
### Limitations
|
||||
|
||||
#### 1. Performance Limitations
|
||||
|
||||
| Issue | Impact | Potential Fix |
|
||||
|-----------------------|----------------------------------------------------|----------------------------------|
|
||||
| Naive evaluation | Re-derives all facts each iteration | Implement semi-naive evaluation |
|
||||
| No query optimization | Joins evaluated left-to-right | Add query planner |
|
||||
| Full materialization | All derived facts stored | Support demand-driven evaluation |
|
||||
| No stratification | Potential issues with negation (not yet supported) | Add stratification analysis |
|
||||
|
||||
#### 2. Missing Datalog Features
|
||||
|
||||
| Feature | Status | Description |
|
||||
|---------------------|-----------------|------------------------------|
|
||||
| Negation | Not supported | `not p(X)` in rule bodies |
|
||||
| Aggregation | Not supported | `count`, `sum`, `min`, `max` |
|
||||
| Built-in predicates | Not supported | `X < Y`, `X = Y + 1` |
|
||||
| Stratification | Not applicable | Would be needed for negation |
|
||||
| Magic sets | Not implemented | Query optimization technique |
|
||||
| Tabling/Memoization | Basic only | Via relation storage |
|
||||
|
||||
#### 3. Usability Limitations
|
||||
|
||||
- **No REPL**: No interactive query interface
|
||||
- **No CLI executable**: Executable is commented out in cabal file
|
||||
- **Limited error messages**: Parsing errors are good (Megaparsec), but evaluation errors are basic
|
||||
- **No source locations**: Error messages don't include line numbers for semantic errors
|
||||
- **No pretty-printed output**: Query results shown as Haskell data structures
|
||||
|
||||
#### 4. Safety/Edge Cases
|
||||
|
||||
- **Unsafe head variables**: Variables in rule heads not appearing in bodies cause universe enumeration
|
||||
- **No recursion limit**: Deeply recursive programs could stack overflow
|
||||
- **No fact limit**: Large derived fact sets could exhaust memory
|
||||
|
||||
#### 5. Testing Gaps
|
||||
|
||||
- Only parser golden tests are active
|
||||
- Database and fixed-point tests are commented out
|
||||
- No property-based testing
|
||||
- No negative test cases (expected failures)
|
||||
|
||||
---
|
||||
|
||||
### Example Programs
|
||||
|
||||
#### Transitive Closure (ancestor.dl)
|
||||
|
||||
```datalog
|
||||
parent(xerces, brooke).
|
||||
parent(brooke, damocles).
|
||||
|
||||
ancestor(X, Y) :- parent(X, Y).
|
||||
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).
|
||||
```
|
||||
|
||||
Derives: `ancestor(xerces, brooke)`, `ancestor(brooke, damocles)`, `ancestor(xerces, damocles)`
|
||||
|
||||
#### Symmetric Closure with Parity (graph.dl)
|
||||
|
||||
```datalog
|
||||
odd(X,Y) :- r(X,Y).
|
||||
odd(X,Y) :- even(X,Z), r(Z,Y).
|
||||
even(X,Y) :- odd(X,Z), r(Z,Y).
|
||||
|
||||
r(0,1). r(1,2). r(2,3). r(3,4). r(4,5).
|
||||
r(X,Y) :- r(Y,X).
|
||||
```
|
||||
|
||||
Computes paths of odd/even length in a bidirectional graph.
|
||||
|
||||
---
|
||||
|
||||
### Recommendations for Future Development
|
||||
|
||||
#### High Priority
|
||||
|
||||
1. **Enable the executable** with a simple REPL or query interface
|
||||
2. **Implement semi-naive evaluation** for significant performance improvement
|
||||
3. **Add more tests** including evaluation tests and property tests
|
||||
4. **Add source locations** to AST for better error messages
|
||||
|
||||
#### Medium Priority
|
||||
|
||||
5. **Support negation** with stratification checking
|
||||
6. **Add built-in predicates** for arithmetic comparisons
|
||||
7. **Implement query optimization** (join reordering)
|
||||
8. **Add aggregation** support
|
||||
|
||||
#### Nice to Have
|
||||
|
||||
9. **Magic sets transformation** for goal-directed evaluation
|
||||
10. **Incremental evaluation** for database updates
|
||||
11. **Parallel evaluation** of independent rules
|
||||
12. **External data sources** (CSV, databases)
|
||||
|
||||
---
|
||||
|
||||
### Conclusion
|
||||
|
||||
toy-datalog is a well-structured, pedagogically clear implementation of core Datalog semantics. It correctly implements the fundamental operations of
|
||||
bottom-up evaluation with proper variable binding and fixed-point computation. While it lacks advanced features like negation, aggregation, and query
|
||||
optimization, it provides a solid foundation for learning about Datalog and could be extended into a more full-featured implementation.
|
||||
|
||||
The code quality is high, with good use of Haskell idioms and proper error handling. The main limitations are performance (naive evaluation) and
|
||||
missing features rather than correctness issues.
|
||||
|
||||
## Changelog
|
||||
|
||||
* **Mar 4, 2026** -- The first version was created.
|
||||
Loading…
x
Reference in New Issue
Block a user