14 KiB
felix-db: Project Internals
This document provides detailed technical documentation of the felix-db Datalog implementation.
Overview
felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used for database queries, program analysis, and reasoning systems.
Project Status: The implementation is correct and fully functional for positive Datalog (without negation in rules).
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ User Input │
│ (Datalog text/files) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DatalogParser.hs │
│ (Megaparsec-based parser) │
│ Text → Statement (Fact | Rule | Query) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DatalogDB.hs │
│ (Type class + core data structures) │
│ Relation, RelationRule, RuleElement │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────────────┐
│ InMemoryDB.hs │ │ Rules.hs │
│ (Concrete DB impl) │ │ (Rule digestion logic) │
│ relations + constants │ │ digestHead, digestBody │
└─────────────────────────┘ └─────────────────────────────────┘
│ │
└───────────────┬───────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ NaiveQE.hs │
│ (Naive Query Engine - bottom-up) │
│ Fixed-point computation of Herbrand model │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DigestedQuery.hs │
│ (Query preprocessing) │
│ Query → DigestedQuery with indexed variables │
└─────────────────────────────────────────────────────────────────┘
Module Details
1. DatalogParser.hs
Purpose: Parses Datalog syntax into an Abstract Syntax Tree (AST).
Parser Type: Uses Megaparsec (Parsec Void Text).
Core Types
data Term
= Var Text -- Variables: X, Person, Y1 (uppercase start)
| Sym Text -- Symbols: alice, "london", uk (lowercase/quoted)
| Num Integer
data Literal = Literal
{ positive :: Bool -- True = positive, False = negated
, predName :: Text -- Predicate name
, arguments :: [Term] -- Argument list
}
data Statement
= Fact Literal -- p(a,b).
| Rule Head [Literal] -- p(X,Y) :- q(X,Z), r(Z,Y).
| Query [Text] [Literal] -- ?- p(X,Y). or ?- p(X,Y) → X,Y.
Syntax Support
| Feature | Example | Notes |
|---|---|---|
| Facts | parent("alice", "bob"). |
Ground atoms |
| Rules | ancestor(X,Y) :- parent(X,Y). |
Head :- body |
| Queries | ?- parent(X,Y). |
Returns all variables |
| Projected queries | ?- edge(A,B), edge(B,C) → A,C. |
Explicit output vars |
| Negation | not member(X,L) or !member(X,L) |
Parsed but not evaluated |
| Comments | -- line or /* block */ |
Haskell-style |
| Arrow variants | :-, →, -> |
All accepted |
Entry Points
parseDatalog :: Text -> Either (ParseErrorBundle Text Void) Statement
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]
2. DatalogDB.hs
Purpose: Defines the core data structures and the DatalogDB type class.
Core Types
data Relation = Relation
{ name :: RelationId -- Relation name (Text)
, arity :: Int -- Number of arguments
, tuples :: Set [Constant] -- Known facts (extensional)
, rules :: [RelationRule] -- Derivation rules (intensional)
}
data RelationRule = RelationRule
{ headVariables :: [Text] -- Variables in scope
, bodyElements :: [RuleBodyElement] -- Body constraints
, outputElements :: [RuleElement] -- How to construct head tuple
}
data RuleBodyElement = RuleBodyElement
{ subRelationId :: RelationId -- Referenced relation
, ruleElements :: [RuleElement] -- How to map variables
}
data RuleElement
= RuleElementConstant Constant -- Fixed value
| RuleElementVariable Int -- Index into headVariables
Type Class
class DatalogDB db where
emptyDB :: db
relationNames :: db -> [Text]
lookupRelation :: db -> Text -> Maybe Relation
insertRelation :: db -> Relation -> db
addConstants :: db -> Set Constant -> db
allConstants :: db -> Set Constant -- Herbrand universe
3. InMemoryDB.hs
Purpose: Concrete implementation of DatalogDB using in-memory data structures.
data InMemoryDB = InMemoryDB
{ relations :: Map RelationId Relation
, constants :: Set Constant -- The Herbrand universe
}
Builder Functions
-- Build DB from facts only
withFacts :: [Text] -> InMemoryDB
-- Build DB from facts and rules
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB
-- Build DB with explicit extra constants
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB
Example:
let db = withFactsAndRules
[ "parent(\"alice\", \"bob\")."
, "parent(\"bob\", \"carol\")."
]
[ "ancestor(X,Y) :- parent(X,Y)."
, "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
]
4. Rules.hs
Purpose: Processes rules and builds the internal representation.
Key Functions
digestHead - Processes the rule head:
- Extracts variables using
nub(preserves first-occurrence order) - Creates
RuleElementlist mapping terms to indices/constants - Collects constants for the Herbrand universe
digestBody - Processes each body literal:
- Extends variable list with new variables from body
- Creates
BodyConstraintlinking to sub-relations - Maps terms to
RuleElement(index or constant)
Variable Indexing Example:
Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
After digestHead: variableNames = ["X", "Y"]
After digestBody[0]: variableNames = ["X", "Y", "Z"] (Z added)
After digestBody[1]: variableNames = ["X", "Y", "Z"] (no change)
Indices: X=0, Y=1, Z=2
addRule - Main entry point:
addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db
5. NaiveQE.hs (Naive Query Engine)
Purpose: Implements naive bottom-up evaluation with fixed-point iteration.
data NaiveQE db = NaiveQE
{ db :: db
, herbrand :: Map Text Relation -- Computed model
}
Algorithm: computeHerbrand
The naive evaluation algorithm:
1. Initialize Facts := all known facts from DB
2. Repeat:
a. For each rule R in each relation:
- Enumerate all variable assignments from Herbrand universe
- For each assignment that satisfies all body constraints:
- Compute the derived tuple
- If tuple is new, add to Facts and set changed=True
b. If no changes, terminate
3. Return final Facts (the minimal Herbrand model)
Complexity: O(n^k) per iteration where n = |constants|, k = |variables in rule|
Query Execution
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
- Enumerate all assignments for sought variables
- For each, check if any assignment to unsought variables satisfies all conditions
- Return matching sought-variable tuples
6. DigestedQuery.hs
Purpose: Preprocesses queries for efficient execution.
data DigestedQuery = DigestedQuery
{ allBoundVariables :: [Text] -- All variables in query
, numSoughtVariables :: Int -- Variables in output
, conditions :: [DigestedQueryCondition]
}
data DigestedQueryCondition = DigestedQueryCondition
{ relationName :: Text
, entries :: [DigestedQueryEntry]
}
data DigestedQueryEntry
= DigestedQueryEntryConstant Constant
| DigestedQueryEntryVariable Int -- Index into allBoundVariables
Query Forms:
?- p(X,Y).- Output all variables (X, Y)?- edge(A,B), edge(B,C) → A,C.- Output only A, C (B is existential)
7. QueryEngine.hs
Purpose: Type class for query engines.
class QueryEngine qe where
queryEngine :: (DatalogDB db) => db -> qe db
query :: (DatalogDB db) => qe db -> Text -> Text
8. Utility.hs
Purpose: Helper functions.
-- Generate all possible maps from domain to codomain
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]
Used by NaiveQE to enumerate variable assignments.
Data Flow Example
Input:
parent("alice", "bob").
parent("bob", "carol").
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
?- ancestor(alice, X).
Step 1: Parse → [Fact, Fact, Rule, Rule, Query]
Step 2: Build DB
relations = {
"parent" -> Relation {
tuples = {["alice","bob"], ["bob","carol"]},
rules = []
},
"ancestor" -> Relation {
tuples = {},
rules = [Rule1, Rule2]
}
}
constants = {"alice", "bob", "carol"}
Step 3: Compute Herbrand Model
Iteration 1: Apply rules
ancestor(alice,bob) ← parent(alice,bob)
ancestor(bob,carol) ← parent(bob,carol)
Iteration 2: Apply rules again
ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)
Iteration 3: No new facts → terminate
Step 4: Execute Query
?- ancestor(alice, X)
Result: bob, carol
Correctness Assessment
| Component | Status | Notes |
|---|---|---|
| Parser | Correct | Full Datalog syntax support |
| Data Model | Correct | Proper Relation/Rule representation |
| Rule Processing | Correct | Correct variable indexing with nub |
| Query Engine | Correct | Fixed-point terminates correctly |
| Tests | Pass | Comprehensive coverage |
Limitations
- No stratified negation - Negation is parsed but ignored in evaluation
- Naive complexity - Exponential in number of variables per rule
- No aggregation - Standard Datalog without extensions
- No built-in predicates - No arithmetic comparison, etc.
Test Coverage
Tests are in test/Test/Datalog/:
- DatalogParserSpec.hs - Parser correctness
- InMemoryDBSpec.hs - Database operations
- RulesSpec.hs - Rule digestion
- NaiveQESpec.hs - Query evaluation
- DigestedQuerySpec.hs - Query preprocessing
Notable Test Cases:
- Reflexive relations:
equiv(X, X) :- . - Symmetric relations:
equiv(Y, X) :- equiv(X, Y). - Transitive closure:
equiv(X, Z) :- equiv(X, Y), equiv(Y, Z). - Full equivalence relations (combining all three)
- Genealogical queries (niece, sibling relationships)
- Poset ordering with constants
Build & Run
cabal build ## Build
cabal test ## Run all tests
cabal repl ## Interactive REPL
## Run specific tests
cabal test --test-option=--match="/NaiveQE/"
Dependencies
- megaparsec - Parser combinators
- containers - Map, Set
- text - Text processing
- hspec - Testing framework
Future Work
Potential enhancements:
- Implement stratified negation
- Add semi-naive evaluation for better performance
- Built-in predicates (comparison, arithmetic)
- Magic sets optimization
- Persistent storage backend
Changelog
- Mar 4, 2026 -- The first version was created.