448 lines
14 KiB
Markdown
448 lines
14 KiB
Markdown
## felix-db: Project Internals
|
|
|
|
This document provides detailed technical documentation of the felix-db Datalog implementation.
|
|
|
|
### Overview
|
|
|
|
felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used
|
|
for database queries, program analysis, and reasoning systems.
|
|
|
|
**Project Status:** The implementation is correct and fully functional for positive Datalog (without negation in rules).
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ User Input │
|
|
│ (Datalog text/files) │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DatalogParser.hs │
|
|
│ (Megaparsec-based parser) │
|
|
│ Text → Statement (Fact | Rule | Query) │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DatalogDB.hs │
|
|
│ (Type class + core data structures) │
|
|
│ Relation, RelationRule, RuleElement │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌───────────────┴───────────────┐
|
|
▼ ▼
|
|
┌─────────────────────────┐ ┌─────────────────────────────────┐
|
|
│ InMemoryDB.hs │ │ Rules.hs │
|
|
│ (Concrete DB impl) │ │ (Rule digestion logic) │
|
|
│ relations + constants │ │ digestHead, digestBody │
|
|
└─────────────────────────┘ └─────────────────────────────────┘
|
|
│ │
|
|
└───────────────┬───────────────┘
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ NaiveQE.hs │
|
|
│ (Naive Query Engine - bottom-up) │
|
|
│ Fixed-point computation of Herbrand model │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DigestedQuery.hs │
|
|
│ (Query preprocessing) │
|
|
│ Query → DigestedQuery with indexed variables │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Module Details
|
|
|
|
#### 1. DatalogParser.hs
|
|
|
|
**Purpose:** Parses Datalog syntax into an Abstract Syntax Tree (AST).
|
|
|
|
**Parser Type:** Uses Megaparsec (`Parsec Void Text`).
|
|
|
|
##### Core Types
|
|
|
|
```haskell
|
|
data Term
|
|
= Var Text -- Variables: X, Person, Y1 (uppercase start)
|
|
| Sym Text -- Symbols: alice, "london", uk (lowercase/quoted)
|
|
| Num Integer
|
|
|
|
data Literal = Literal
|
|
{ positive :: Bool -- True = positive, False = negated
|
|
, predName :: Text -- Predicate name
|
|
, arguments :: [Term] -- Argument list
|
|
}
|
|
|
|
data Statement
|
|
= Fact Literal -- p(a,b).
|
|
| Rule Head [Literal] -- p(X,Y) :- q(X,Z), r(Z,Y).
|
|
| Query [Text] [Literal] -- ?- p(X,Y). or ?- p(X,Y) → X,Y.
|
|
```
|
|
|
|
##### Syntax Support
|
|
|
|
| Feature | Example | Notes |
|
|
|-------------------|-------------------------------------|--------------------------|
|
|
| Facts | `parent("alice", "bob").` | Ground atoms |
|
|
| Rules | `ancestor(X,Y) :- parent(X,Y).` | Head :- body |
|
|
| Queries | `?- parent(X,Y).` | Returns all variables |
|
|
| Projected queries | `?- edge(A,B), edge(B,C) → A,C.` | Explicit output vars |
|
|
| Negation | `not member(X,L)` or `!member(X,L)` | Parsed but not evaluated |
|
|
| Comments | `-- line` or `/* block */` | Haskell-style |
|
|
| Arrow variants | `:-`, `→`, `->` | All accepted |
|
|
|
|
##### Entry Points
|
|
|
|
```haskell
|
|
parseDatalog :: Text -> Either (ParseErrorBundle Text Void) Statement
|
|
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]
|
|
```
|
|
|
|
---
|
|
|
|
#### 2. DatalogDB.hs
|
|
|
|
**Purpose:** Defines the core data structures and the `DatalogDB` type class.
|
|
|
|
##### Core Types
|
|
|
|
```haskell
|
|
data Relation = Relation
|
|
{ name :: RelationId -- Relation name (Text)
|
|
, arity :: Int -- Number of arguments
|
|
, tuples :: Set [Constant] -- Known facts (extensional)
|
|
, rules :: [RelationRule] -- Derivation rules (intensional)
|
|
}
|
|
|
|
data RelationRule = RelationRule
|
|
{ headVariables :: [Text] -- Variables in scope
|
|
, bodyElements :: [RuleBodyElement] -- Body constraints
|
|
, outputElements :: [RuleElement] -- How to construct head tuple
|
|
}
|
|
|
|
data RuleBodyElement = RuleBodyElement
|
|
{ subRelationId :: RelationId -- Referenced relation
|
|
, ruleElements :: [RuleElement] -- How to map variables
|
|
}
|
|
|
|
data RuleElement
|
|
= RuleElementConstant Constant -- Fixed value
|
|
| RuleElementVariable Int -- Index into headVariables
|
|
```
|
|
|
|
##### Type Class
|
|
|
|
```haskell
|
|
class DatalogDB db where
|
|
emptyDB :: db
|
|
relationNames :: db -> [Text]
|
|
lookupRelation :: db -> Text -> Maybe Relation
|
|
insertRelation :: db -> Relation -> db
|
|
addConstants :: db -> Set Constant -> db
|
|
allConstants :: db -> Set Constant -- Herbrand universe
|
|
```
|
|
|
|
---
|
|
|
|
#### 3. InMemoryDB.hs
|
|
|
|
**Purpose:** Concrete implementation of `DatalogDB` using in-memory data structures.
|
|
|
|
```haskell
|
|
data InMemoryDB = InMemoryDB
|
|
{ relations :: Map RelationId Relation
|
|
, constants :: Set Constant -- The Herbrand universe
|
|
}
|
|
```
|
|
|
|
##### Builder Functions
|
|
|
|
```haskell
|
|
-- Build DB from facts only
|
|
withFacts :: [Text] -> InMemoryDB
|
|
|
|
-- Build DB from facts and rules
|
|
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB
|
|
|
|
-- Build DB with explicit extra constants
|
|
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB
|
|
```
|
|
|
|
**Example:**
|
|
|
|
```haskell
|
|
let db = withFactsAndRules
|
|
[ "parent(\"alice\", \"bob\")."
|
|
, "parent(\"bob\", \"carol\")."
|
|
]
|
|
[ "ancestor(X,Y) :- parent(X,Y)."
|
|
, "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
#### 4. Rules.hs
|
|
|
|
**Purpose:** Processes rules and builds the internal representation.
|
|
|
|
##### Key Functions
|
|
|
|
**`digestHead`** - Processes the rule head:
|
|
|
|
1. Extracts variables using `nub` (preserves first-occurrence order)
|
|
2. Creates `RuleElement` list mapping terms to indices/constants
|
|
3. Collects constants for the Herbrand universe
|
|
|
|
**`digestBody`** - Processes each body literal:
|
|
|
|
1. Extends variable list with new variables from body
|
|
2. Creates `BodyConstraint` linking to sub-relations
|
|
3. Maps terms to `RuleElement` (index or constant)
|
|
|
|
**Variable Indexing Example:**
|
|
|
|
```
|
|
Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
|
|
|
|
After digestHead: variableNames = ["X", "Y"]
|
|
After digestBody[0]: variableNames = ["X", "Y", "Z"] (Z added)
|
|
After digestBody[1]: variableNames = ["X", "Y", "Z"] (no change)
|
|
|
|
Indices: X=0, Y=1, Z=2
|
|
```
|
|
|
|
**`addRule`** - Main entry point:
|
|
|
|
```haskell
|
|
addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db
|
|
```
|
|
|
|
---
|
|
|
|
#### 5. NaiveQE.hs (Naive Query Engine)
|
|
|
|
**Purpose:** Implements naive bottom-up evaluation with fixed-point iteration.
|
|
|
|
```haskell
|
|
data NaiveQE db = NaiveQE
|
|
{ db :: db
|
|
, herbrand :: Map Text Relation -- Computed model
|
|
}
|
|
```
|
|
|
|
##### Algorithm: `computeHerbrand`
|
|
|
|
The naive evaluation algorithm:
|
|
|
|
```
|
|
1. Initialize Facts := all known facts from DB
|
|
2. Repeat:
|
|
a. For each rule R in each relation:
|
|
- Enumerate all variable assignments from Herbrand universe
|
|
- For each assignment that satisfies all body constraints:
|
|
- Compute the derived tuple
|
|
- If tuple is new, add to Facts and set changed=True
|
|
b. If no changes, terminate
|
|
3. Return final Facts (the minimal Herbrand model)
|
|
```
|
|
|
|
**Complexity:** O(n^k) per iteration where n = |constants|, k = |variables in rule|
|
|
|
|
##### Query Execution
|
|
|
|
```haskell
|
|
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
|
|
```
|
|
|
|
1. Enumerate all assignments for sought variables
|
|
2. For each, check if any assignment to unsought variables satisfies all conditions
|
|
3. Return matching sought-variable tuples
|
|
|
|
---
|
|
|
|
#### 6. DigestedQuery.hs
|
|
|
|
**Purpose:** Preprocesses queries for efficient execution.
|
|
|
|
```haskell
|
|
data DigestedQuery = DigestedQuery
|
|
{ allBoundVariables :: [Text] -- All variables in query
|
|
, numSoughtVariables :: Int -- Variables in output
|
|
, conditions :: [DigestedQueryCondition]
|
|
}
|
|
|
|
data DigestedQueryCondition = DigestedQueryCondition
|
|
{ relationName :: Text
|
|
, entries :: [DigestedQueryEntry]
|
|
}
|
|
|
|
data DigestedQueryEntry
|
|
= DigestedQueryEntryConstant Constant
|
|
| DigestedQueryEntryVariable Int -- Index into allBoundVariables
|
|
```
|
|
|
|
**Query Forms:**
|
|
|
|
- `?- p(X,Y).` - Output all variables (X, Y)
|
|
- `?- edge(A,B), edge(B,C) → A,C.` - Output only A, C (B is existential)
|
|
|
|
---
|
|
|
|
#### 7. QueryEngine.hs
|
|
|
|
**Purpose:** Type class for query engines.
|
|
|
|
```haskell
|
|
class QueryEngine qe where
|
|
queryEngine :: (DatalogDB db) => db -> qe db
|
|
query :: (DatalogDB db) => qe db -> Text -> Text
|
|
```
|
|
|
|
---
|
|
|
|
#### 8. Utility.hs
|
|
|
|
**Purpose:** Helper functions.
|
|
|
|
```haskell
|
|
-- Generate all possible maps from domain to codomain
|
|
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]
|
|
```
|
|
|
|
Used by `NaiveQE` to enumerate variable assignments.
|
|
|
|
---
|
|
|
|
### Data Flow Example
|
|
|
|
**Input:**
|
|
|
|
```datalog
|
|
parent("alice", "bob").
|
|
parent("bob", "carol").
|
|
ancestor(X,Y) :- parent(X,Y).
|
|
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
|
|
?- ancestor(alice, X).
|
|
```
|
|
|
|
**Step 1: Parse** → `[Fact, Fact, Rule, Rule, Query]`
|
|
|
|
**Step 2: Build DB**
|
|
|
|
```
|
|
relations = {
|
|
"parent" -> Relation {
|
|
tuples = {["alice","bob"], ["bob","carol"]},
|
|
rules = []
|
|
},
|
|
"ancestor" -> Relation {
|
|
tuples = {},
|
|
rules = [Rule1, Rule2]
|
|
}
|
|
}
|
|
constants = {"alice", "bob", "carol"}
|
|
```
|
|
|
|
**Step 3: Compute Herbrand Model**
|
|
|
|
```
|
|
Iteration 1: Apply rules
|
|
ancestor(alice,bob) ← parent(alice,bob)
|
|
ancestor(bob,carol) ← parent(bob,carol)
|
|
|
|
Iteration 2: Apply rules again
|
|
ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)
|
|
|
|
Iteration 3: No new facts → terminate
|
|
```
|
|
|
|
**Step 4: Execute Query**
|
|
|
|
```
|
|
?- ancestor(alice, X)
|
|
Result: bob, carol
|
|
```
|
|
|
|
---
|
|
|
|
### Correctness Assessment
|
|
|
|
| Component | Status | Notes |
|
|
|-----------------|---------|--------------------------------------|
|
|
| Parser | Correct | Full Datalog syntax support |
|
|
| Data Model | Correct | Proper Relation/Rule representation |
|
|
| Rule Processing | Correct | Correct variable indexing with `nub` |
|
|
| Query Engine | Correct | Fixed-point terminates correctly |
|
|
| Tests | Pass | Comprehensive coverage |
|
|
|
|
#### Limitations
|
|
|
|
1. **No stratified negation** - Negation is parsed but ignored in evaluation
|
|
2. **Naive complexity** - Exponential in number of variables per rule
|
|
3. **No aggregation** - Standard Datalog without extensions
|
|
4. **No built-in predicates** - No arithmetic comparison, etc.
|
|
|
|
---
|
|
|
|
### Test Coverage
|
|
|
|
Tests are in `test/Test/Datalog/`:
|
|
|
|
- **DatalogParserSpec.hs** - Parser correctness
|
|
- **InMemoryDBSpec.hs** - Database operations
|
|
- **RulesSpec.hs** - Rule digestion
|
|
- **NaiveQESpec.hs** - Query evaluation
|
|
- **DigestedQuerySpec.hs** - Query preprocessing
|
|
|
|
**Notable Test Cases:**
|
|
|
|
- Reflexive relations: `equiv(X, X) :- .`
|
|
- Symmetric relations: `equiv(Y, X) :- equiv(X, Y).`
|
|
- Transitive closure: `equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).`
|
|
- Full equivalence relations (combining all three)
|
|
- Genealogical queries (niece, sibling relationships)
|
|
- Poset ordering with constants
|
|
|
|
---
|
|
|
|
### Build & Run
|
|
|
|
```bash
|
|
cabal build ## Build
|
|
cabal test ## Run all tests
|
|
cabal repl ## Interactive REPL
|
|
|
|
## Run specific tests
|
|
cabal test --test-option=--match="/NaiveQE/"
|
|
```
|
|
|
|
---
|
|
|
|
### Dependencies
|
|
|
|
- **megaparsec** - Parser combinators
|
|
- **containers** - Map, Set
|
|
- **text** - Text processing
|
|
- **hspec** - Testing framework
|
|
|
|
---
|
|
|
|
### Future Work
|
|
|
|
Potential enhancements:
|
|
|
|
1. Implement stratified negation
|
|
2. Add semi-naive evaluation for better performance
|
|
3. Built-in predicates (comparison, arithmetic)
|
|
4. Magic sets optimization
|
|
5. Persistent storage backend
|
|
|
|
## Changelog
|
|
|
|
* **Mar 4, 2026** -- The first version was created.
|