useful-notes/hassan/005-felixs-datalog.md

14 KiB

felix-db: Project Internals

This document provides detailed technical documentation of the felix-db Datalog implementation.

Overview

felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used for database queries, program analysis, and reasoning systems.

Project Status: The implementation is correct and fully functional for positive Datalog (without negation in rules).

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         User Input                              │
│                    (Datalog text/files)                         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DatalogParser.hs                             │
│              (Megaparsec-based parser)                          │
│         Text → Statement (Fact | Rule | Query)                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DatalogDB.hs                               │
│            (Type class + core data structures)                  │
│         Relation, RelationRule, RuleElement                     │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────────────┐
│    InMemoryDB.hs        │     │         Rules.hs                │
│  (Concrete DB impl)     │     │    (Rule digestion logic)       │
│  relations + constants  │     │   digestHead, digestBody        │
└─────────────────────────┘     └─────────────────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       NaiveQE.hs                                │
│            (Naive Query Engine - bottom-up)                     │
│         Fixed-point computation of Herbrand model               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DigestedQuery.hs                             │
│              (Query preprocessing)                              │
│         Query → DigestedQuery with indexed variables            │
└─────────────────────────────────────────────────────────────────┘

Module Details

1. DatalogParser.hs

Purpose: Parses Datalog syntax into an Abstract Syntax Tree (AST).

Parser Type: Uses Megaparsec (Parsec Void Text).

Core Types
data Term
  = Var Text   -- Variables: X, Person, Y1 (uppercase start)
  | Sym Text   -- Symbols: alice, "london", uk (lowercase/quoted)
  | Num Integer

data Literal = Literal
  { positive  :: Bool    -- True = positive, False = negated
  , predName  :: Text    -- Predicate name
  , arguments :: [Term]  -- Argument list
  }

data Statement
  = Fact Literal                -- p(a,b).
  | Rule Head [Literal]         -- p(X,Y) :- q(X,Z), r(Z,Y).
  | Query [Text] [Literal]      -- ?- p(X,Y).  or  ?- p(X,Y) → X,Y.
Syntax Support
Feature Example Notes
Facts parent("alice", "bob"). Ground atoms
Rules ancestor(X,Y) :- parent(X,Y). Head :- body
Queries ?- parent(X,Y). Returns all variables
Projected queries ?- edge(A,B), edge(B,C) → A,C. Explicit output vars
Negation not member(X,L) or !member(X,L) Parsed but not evaluated
Comments -- line or /* block */ Haskell-style
Arrow variants :-, , -> All accepted
Entry Points
parseDatalog     :: Text -> Either (ParseErrorBundle Text Void) Statement
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]

2. DatalogDB.hs

Purpose: Defines the core data structures and the DatalogDB type class.

Core Types
data Relation = Relation
  { name   :: RelationId       -- Relation name (Text)
  , arity  :: Int              -- Number of arguments
  , tuples :: Set [Constant]   -- Known facts (extensional)
  , rules  :: [RelationRule]   -- Derivation rules (intensional)
  }

data RelationRule = RelationRule
  { headVariables  :: [Text]           -- Variables in scope
  , bodyElements   :: [RuleBodyElement] -- Body constraints
  , outputElements :: [RuleElement]     -- How to construct head tuple
  }

data RuleBodyElement = RuleBodyElement
  { subRelationId :: RelationId    -- Referenced relation
  , ruleElements  :: [RuleElement] -- How to map variables
  }

data RuleElement
  = RuleElementConstant Constant   -- Fixed value
  | RuleElementVariable Int        -- Index into headVariables
Type Class
class DatalogDB db where
  emptyDB        :: db
  relationNames  :: db -> [Text]
  lookupRelation :: db -> Text -> Maybe Relation
  insertRelation :: db -> Relation -> db
  addConstants   :: db -> Set Constant -> db
  allConstants   :: db -> Set Constant  -- Herbrand universe

3. InMemoryDB.hs

Purpose: Concrete implementation of DatalogDB using in-memory data structures.

data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation
  , constants :: Set Constant  -- The Herbrand universe
  }
Builder Functions
-- Build DB from facts only
withFacts :: [Text] -> InMemoryDB

-- Build DB from facts and rules
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB

-- Build DB with explicit extra constants
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB

Example:

let db = withFactsAndRules
  [ "parent(\"alice\", \"bob\")."
  , "parent(\"bob\", \"carol\")."
  ]
  [ "ancestor(X,Y) :- parent(X,Y)."
  , "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
  ]

4. Rules.hs

Purpose: Processes rules and builds the internal representation.

Key Functions

digestHead - Processes the rule head:

  1. Extracts variables using nub (preserves first-occurrence order)
  2. Creates RuleElement list mapping terms to indices/constants
  3. Collects constants for the Herbrand universe

digestBody - Processes each body literal:

  1. Extends variable list with new variables from body
  2. Creates BodyConstraint linking to sub-relations
  3. Maps terms to RuleElement (index or constant)

Variable Indexing Example:

Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

After digestHead:      variableNames = ["X", "Y"]
After digestBody[0]:   variableNames = ["X", "Y", "Z"]  (Z added)
After digestBody[1]:   variableNames = ["X", "Y", "Z"]  (no change)

Indices: X=0, Y=1, Z=2

addRule - Main entry point:

addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db

5. NaiveQE.hs (Naive Query Engine)

Purpose: Implements naive bottom-up evaluation with fixed-point iteration.

data NaiveQE db = NaiveQE
  { db       :: db
  , herbrand :: Map Text Relation  -- Computed model
  }
Algorithm: computeHerbrand

The naive evaluation algorithm:

1. Initialize Facts := all known facts from DB
2. Repeat:
   a. For each rule R in each relation:
      - Enumerate all variable assignments from Herbrand universe
      - For each assignment that satisfies all body constraints:
        - Compute the derived tuple
        - If tuple is new, add to Facts and set changed=True
   b. If no changes, terminate
3. Return final Facts (the minimal Herbrand model)

Complexity: O(n^k) per iteration where n = |constants|, k = |variables in rule|

Query Execution
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
  1. Enumerate all assignments for sought variables
  2. For each, check if any assignment to unsought variables satisfies all conditions
  3. Return matching sought-variable tuples

6. DigestedQuery.hs

Purpose: Preprocesses queries for efficient execution.

data DigestedQuery = DigestedQuery
  { allBoundVariables   :: [Text]  -- All variables in query
  , numSoughtVariables  :: Int     -- Variables in output
  , conditions          :: [DigestedQueryCondition]
  }

data DigestedQueryCondition = DigestedQueryCondition
  { relationName :: Text
  , entries      :: [DigestedQueryEntry]
  }

data DigestedQueryEntry
  = DigestedQueryEntryConstant Constant
  | DigestedQueryEntryVariable Int  -- Index into allBoundVariables

Query Forms:

  • ?- p(X,Y). - Output all variables (X, Y)
  • ?- edge(A,B), edge(B,C) → A,C. - Output only A, C (B is existential)

7. QueryEngine.hs

Purpose: Type class for query engines.

class QueryEngine qe where
  queryEngine :: (DatalogDB db) => db -> qe db
  query       :: (DatalogDB db) => qe db -> Text -> Text

8. Utility.hs

Purpose: Helper functions.

-- Generate all possible maps from domain to codomain
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]

Used by NaiveQE to enumerate variable assignments.


Data Flow Example

Input:

parent("alice", "bob").
parent("bob", "carol").
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
?- ancestor(alice, X).

Step 1: Parse[Fact, Fact, Rule, Rule, Query]

Step 2: Build DB

relations = {
  "parent" -> Relation {
    tuples = {["alice","bob"], ["bob","carol"]},
    rules = []
  },
  "ancestor" -> Relation {
    tuples = {},
    rules = [Rule1, Rule2]
  }
}
constants = {"alice", "bob", "carol"}

Step 3: Compute Herbrand Model

Iteration 1: Apply rules
  ancestor(alice,bob)  ← parent(alice,bob)
  ancestor(bob,carol)  ← parent(bob,carol)

Iteration 2: Apply rules again
  ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)

Iteration 3: No new facts → terminate

Step 4: Execute Query

?- ancestor(alice, X)
Result: bob, carol

Correctness Assessment

Component Status Notes
Parser Correct Full Datalog syntax support
Data Model Correct Proper Relation/Rule representation
Rule Processing Correct Correct variable indexing with nub
Query Engine Correct Fixed-point terminates correctly
Tests Pass Comprehensive coverage

Limitations

  1. No stratified negation - Negation is parsed but ignored in evaluation
  2. Naive complexity - Exponential in number of variables per rule
  3. No aggregation - Standard Datalog without extensions
  4. No built-in predicates - No arithmetic comparison, etc.

Test Coverage

Tests are in test/Test/Datalog/:

  • DatalogParserSpec.hs - Parser correctness
  • InMemoryDBSpec.hs - Database operations
  • RulesSpec.hs - Rule digestion
  • NaiveQESpec.hs - Query evaluation
  • DigestedQuerySpec.hs - Query preprocessing

Notable Test Cases:

  • Reflexive relations: equiv(X, X) :- .
  • Symmetric relations: equiv(Y, X) :- equiv(X, Y).
  • Transitive closure: equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).
  • Full equivalence relations (combining all three)
  • Genealogical queries (niece, sibling relationships)
  • Poset ordering with constants

Build & Run

cabal build          ## Build
cabal test           ## Run all tests
cabal repl           ## Interactive REPL

## Run specific tests
cabal test --test-option=--match="/NaiveQE/"

Dependencies

  • megaparsec - Parser combinators
  • containers - Map, Set
  • text - Text processing
  • hspec - Testing framework

Future Work

Potential enhancements:

  1. Implement stratified negation
  2. Add semi-naive evaluation for better performance
  3. Built-in predicates (comparison, arithmetic)
  4. Magic sets optimization
  5. Persistent storage backend

Changelog

  • Mar 4, 2026 -- The first version was created.