## felix-db: Project Internals This document provides detailed technical documentation of the felix-db Datalog implementation. ### Overview felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used for database queries, program analysis, and reasoning systems. **Project Status:** The implementation is correct and fully functional for positive Datalog (without negation in rules). ### Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ User Input │ │ (Datalog text/files) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ DatalogParser.hs │ │ (Megaparsec-based parser) │ │ Text → Statement (Fact | Rule | Query) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ DatalogDB.hs │ │ (Type class + core data structures) │ │ Relation, RelationRule, RuleElement │ └─────────────────────────────────────────────────────────────────┘ │ ┌───────────────┴───────────────┐ ▼ ▼ ┌─────────────────────────┐ ┌─────────────────────────────────┐ │ InMemoryDB.hs │ │ Rules.hs │ │ (Concrete DB impl) │ │ (Rule digestion logic) │ │ relations + constants │ │ digestHead, digestBody │ └─────────────────────────┘ └─────────────────────────────────┘ │ │ └───────────────┬───────────────┘ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ NaiveQE.hs │ │ (Naive Query Engine - bottom-up) │ │ Fixed-point computation of Herbrand model │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ DigestedQuery.hs │ │ (Query preprocessing) │ │ Query → DigestedQuery with indexed variables │ └─────────────────────────────────────────────────────────────────┘ ``` ### Module Details #### 1. DatalogParser.hs **Purpose:** Parses Datalog syntax into an Abstract Syntax Tree (AST). **Parser Type:** Uses Megaparsec (`Parsec Void Text`). ##### Core Types ```haskell data Term = Var Text -- Variables: X, Person, Y1 (uppercase start) | Sym Text -- Symbols: alice, "london", uk (lowercase/quoted) | Num Integer data Literal = Literal { positive :: Bool -- True = positive, False = negated , predName :: Text -- Predicate name , arguments :: [Term] -- Argument list } data Statement = Fact Literal -- p(a,b). | Rule Head [Literal] -- p(X,Y) :- q(X,Z), r(Z,Y). | Query [Text] [Literal] -- ?- p(X,Y). or ?- p(X,Y) → X,Y. ``` ##### Syntax Support | Feature | Example | Notes | |-------------------|-------------------------------------|--------------------------| | Facts | `parent("alice", "bob").` | Ground atoms | | Rules | `ancestor(X,Y) :- parent(X,Y).` | Head :- body | | Queries | `?- parent(X,Y).` | Returns all variables | | Projected queries | `?- edge(A,B), edge(B,C) → A,C.` | Explicit output vars | | Negation | `not member(X,L)` or `!member(X,L)` | Parsed but not evaluated | | Comments | `-- line` or `/* block */` | Haskell-style | | Arrow variants | `:-`, `→`, `->` | All accepted | ##### Entry Points ```haskell parseDatalog :: Text -> Either (ParseErrorBundle Text Void) Statement parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement] ``` --- #### 2. DatalogDB.hs **Purpose:** Defines the core data structures and the `DatalogDB` type class. ##### Core Types ```haskell data Relation = Relation { name :: RelationId -- Relation name (Text) , arity :: Int -- Number of arguments , tuples :: Set [Constant] -- Known facts (extensional) , rules :: [RelationRule] -- Derivation rules (intensional) } data RelationRule = RelationRule { headVariables :: [Text] -- Variables in scope , bodyElements :: [RuleBodyElement] -- Body constraints , outputElements :: [RuleElement] -- How to construct head tuple } data RuleBodyElement = RuleBodyElement { subRelationId :: RelationId -- Referenced relation , ruleElements :: [RuleElement] -- How to map variables } data RuleElement = RuleElementConstant Constant -- Fixed value | RuleElementVariable Int -- Index into headVariables ``` ##### Type Class ```haskell class DatalogDB db where emptyDB :: db relationNames :: db -> [Text] lookupRelation :: db -> Text -> Maybe Relation insertRelation :: db -> Relation -> db addConstants :: db -> Set Constant -> db allConstants :: db -> Set Constant -- Herbrand universe ``` --- #### 3. InMemoryDB.hs **Purpose:** Concrete implementation of `DatalogDB` using in-memory data structures. ```haskell data InMemoryDB = InMemoryDB { relations :: Map RelationId Relation , constants :: Set Constant -- The Herbrand universe } ``` ##### Builder Functions ```haskell -- Build DB from facts only withFacts :: [Text] -> InMemoryDB -- Build DB from facts and rules withFactsAndRules :: [Text] -> [Text] -> InMemoryDB -- Build DB with explicit extra constants withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB ``` **Example:** ```haskell let db = withFactsAndRules [ "parent(\"alice\", \"bob\")." , "parent(\"bob\", \"carol\")." ] [ "ancestor(X,Y) :- parent(X,Y)." , "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)." ] ``` --- #### 4. Rules.hs **Purpose:** Processes rules and builds the internal representation. ##### Key Functions **`digestHead`** - Processes the rule head: 1. Extracts variables using `nub` (preserves first-occurrence order) 2. Creates `RuleElement` list mapping terms to indices/constants 3. Collects constants for the Herbrand universe **`digestBody`** - Processes each body literal: 1. Extends variable list with new variables from body 2. Creates `BodyConstraint` linking to sub-relations 3. Maps terms to `RuleElement` (index or constant) **Variable Indexing Example:** ``` Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). After digestHead: variableNames = ["X", "Y"] After digestBody[0]: variableNames = ["X", "Y", "Z"] (Z added) After digestBody[1]: variableNames = ["X", "Y", "Z"] (no change) Indices: X=0, Y=1, Z=2 ``` **`addRule`** - Main entry point: ```haskell addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db ``` --- #### 5. NaiveQE.hs (Naive Query Engine) **Purpose:** Implements naive bottom-up evaluation with fixed-point iteration. ```haskell data NaiveQE db = NaiveQE { db :: db , herbrand :: Map Text Relation -- Computed model } ``` ##### Algorithm: `computeHerbrand` The naive evaluation algorithm: ``` 1. Initialize Facts := all known facts from DB 2. Repeat: a. For each rule R in each relation: - Enumerate all variable assignments from Herbrand universe - For each assignment that satisfies all body constraints: - Compute the derived tuple - If tuple is new, add to Facts and set changed=True b. If no changes, terminate 3. Return final Facts (the minimal Herbrand model) ``` **Complexity:** O(n^k) per iteration where n = |constants|, k = |variables in rule| ##### Query Execution ```haskell executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults ``` 1. Enumerate all assignments for sought variables 2. For each, check if any assignment to unsought variables satisfies all conditions 3. Return matching sought-variable tuples --- #### 6. DigestedQuery.hs **Purpose:** Preprocesses queries for efficient execution. ```haskell data DigestedQuery = DigestedQuery { allBoundVariables :: [Text] -- All variables in query , numSoughtVariables :: Int -- Variables in output , conditions :: [DigestedQueryCondition] } data DigestedQueryCondition = DigestedQueryCondition { relationName :: Text , entries :: [DigestedQueryEntry] } data DigestedQueryEntry = DigestedQueryEntryConstant Constant | DigestedQueryEntryVariable Int -- Index into allBoundVariables ``` **Query Forms:** - `?- p(X,Y).` - Output all variables (X, Y) - `?- edge(A,B), edge(B,C) → A,C.` - Output only A, C (B is existential) --- #### 7. QueryEngine.hs **Purpose:** Type class for query engines. ```haskell class QueryEngine qe where queryEngine :: (DatalogDB db) => db -> qe db query :: (DatalogDB db) => qe db -> Text -> Text ``` --- #### 8. Utility.hs **Purpose:** Helper functions. ```haskell -- Generate all possible maps from domain to codomain allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b] ``` Used by `NaiveQE` to enumerate variable assignments. --- ### Data Flow Example **Input:** ```datalog parent("alice", "bob"). parent("bob", "carol"). ancestor(X,Y) :- parent(X,Y). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). ?- ancestor(alice, X). ``` **Step 1: Parse** → `[Fact, Fact, Rule, Rule, Query]` **Step 2: Build DB** ``` relations = { "parent" -> Relation { tuples = {["alice","bob"], ["bob","carol"]}, rules = [] }, "ancestor" -> Relation { tuples = {}, rules = [Rule1, Rule2] } } constants = {"alice", "bob", "carol"} ``` **Step 3: Compute Herbrand Model** ``` Iteration 1: Apply rules ancestor(alice,bob) ← parent(alice,bob) ancestor(bob,carol) ← parent(bob,carol) Iteration 2: Apply rules again ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol) Iteration 3: No new facts → terminate ``` **Step 4: Execute Query** ``` ?- ancestor(alice, X) Result: bob, carol ``` --- ### Correctness Assessment | Component | Status | Notes | |-----------------|---------|--------------------------------------| | Parser | Correct | Full Datalog syntax support | | Data Model | Correct | Proper Relation/Rule representation | | Rule Processing | Correct | Correct variable indexing with `nub` | | Query Engine | Correct | Fixed-point terminates correctly | | Tests | Pass | Comprehensive coverage | #### Limitations 1. **No stratified negation** - Negation is parsed but ignored in evaluation 2. **Naive complexity** - Exponential in number of variables per rule 3. **No aggregation** - Standard Datalog without extensions 4. **No built-in predicates** - No arithmetic comparison, etc. --- ### Test Coverage Tests are in `test/Test/Datalog/`: - **DatalogParserSpec.hs** - Parser correctness - **InMemoryDBSpec.hs** - Database operations - **RulesSpec.hs** - Rule digestion - **NaiveQESpec.hs** - Query evaluation - **DigestedQuerySpec.hs** - Query preprocessing **Notable Test Cases:** - Reflexive relations: `equiv(X, X) :- .` - Symmetric relations: `equiv(Y, X) :- equiv(X, Y).` - Transitive closure: `equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).` - Full equivalence relations (combining all three) - Genealogical queries (niece, sibling relationships) - Poset ordering with constants --- ### Build & Run ```bash cabal build ## Build cabal test ## Run all tests cabal repl ## Interactive REPL ## Run specific tests cabal test --test-option=--match="/NaiveQE/" ``` --- ### Dependencies - **megaparsec** - Parser combinators - **containers** - Map, Set - **text** - Text processing - **hspec** - Testing framework --- ### Future Work Potential enhancements: 1. Implement stratified negation 2. Add semi-naive evaluation for better performance 3. Built-in predicates (comparison, arithmetic) 4. Magic sets optimization 5. Persistent storage backend ## Changelog * **Mar 4, 2026** -- The first version was created.