useful-notes/005-felixs-datalog.md at 5a33e673c9287d2e53e0277ddb6fd7dc7fe5568c

habedi-work/useful-notes

Fork 0

Hassan Abedi 654012f2ef Add notes about Cale's and Felix's Datalog implementations

2026-03-04 12:37:35 +01:00

14 KiB

Raw Blame History

felix-db: Project Internals

This document provides detailed technical documentation of the felix-db Datalog implementation.

Overview

felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used for database queries, program analysis, and reasoning systems.

Project Status: The implementation is correct and fully functional for positive Datalog (without negation in rules).

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         User Input                              │
│                    (Datalog text/files)                         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DatalogParser.hs                             │
│              (Megaparsec-based parser)                          │
│         Text → Statement (Fact | Rule | Query)                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DatalogDB.hs                               │
│            (Type class + core data structures)                  │
│         Relation, RelationRule, RuleElement                     │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────────────┐
│    InMemoryDB.hs        │     │         Rules.hs                │
│  (Concrete DB impl)     │     │    (Rule digestion logic)       │
│  relations + constants  │     │   digestHead, digestBody        │
└─────────────────────────┘     └─────────────────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       NaiveQE.hs                                │
│            (Naive Query Engine - bottom-up)                     │
│         Fixed-point computation of Herbrand model               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DigestedQuery.hs                             │
│              (Query preprocessing)                              │
│         Query → DigestedQuery with indexed variables            │
└─────────────────────────────────────────────────────────────────┘

Module Details

1. DatalogParser.hs

Purpose: Parses Datalog syntax into an Abstract Syntax Tree (AST).

Parser Type: Uses Megaparsec (Parsec Void Text).

Core Types

data Term
  = Var Text   -- Variables: X, Person, Y1 (uppercase start)
  | Sym Text   -- Symbols: alice, "london", uk (lowercase/quoted)
  | Num Integer

data Literal = Literal
  { positive  :: Bool    -- True = positive, False = negated
  , predName  :: Text    -- Predicate name
  , arguments :: [Term]  -- Argument list
  }

data Statement
  = Fact Literal                -- p(a,b).
  | Rule Head [Literal]         -- p(X,Y) :- q(X,Z), r(Z,Y).
  | Query [Text] [Literal]      -- ?- p(X,Y).  or  ?- p(X,Y) → X,Y.

Syntax Support

Feature	Example	Notes
Facts	`parent("alice", "bob").`	Ground atoms
Rules	`ancestor(X,Y) :- parent(X,Y).`	Head :- body
Queries	`?- parent(X,Y).`	Returns all variables
Projected queries	`?- edge(A,B), edge(B,C) → A,C.`	Explicit output vars
Negation	`not member(X,L)` or `!member(X,L)`	Parsed but not evaluated
Comments	`-- line` or `/* block */`	Haskell-style
Arrow variants	`:-`, `→`, `->`	All accepted

Entry Points

parseDatalog     :: Text -> Either (ParseErrorBundle Text Void) Statement
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]

2. DatalogDB.hs

Purpose: Defines the core data structures and the DatalogDB type class.

Core Types

data Relation = Relation
  { name   :: RelationId       -- Relation name (Text)
  , arity  :: Int              -- Number of arguments
  , tuples :: Set [Constant]   -- Known facts (extensional)
  , rules  :: [RelationRule]   -- Derivation rules (intensional)
  }

data RelationRule = RelationRule
  { headVariables  :: [Text]           -- Variables in scope
  , bodyElements   :: [RuleBodyElement] -- Body constraints
  , outputElements :: [RuleElement]     -- How to construct head tuple
  }

data RuleBodyElement = RuleBodyElement
  { subRelationId :: RelationId    -- Referenced relation
  , ruleElements  :: [RuleElement] -- How to map variables
  }

data RuleElement
  = RuleElementConstant Constant   -- Fixed value
  | RuleElementVariable Int        -- Index into headVariables

Type Class

class DatalogDB db where
  emptyDB        :: db
  relationNames  :: db -> [Text]
  lookupRelation :: db -> Text -> Maybe Relation
  insertRelation :: db -> Relation -> db
  addConstants   :: db -> Set Constant -> db
  allConstants   :: db -> Set Constant  -- Herbrand universe

3. InMemoryDB.hs

Purpose: Concrete implementation of DatalogDB using in-memory data structures.

data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation
  , constants :: Set Constant  -- The Herbrand universe
  }

Builder Functions

-- Build DB from facts only
withFacts :: [Text] -> InMemoryDB

-- Build DB from facts and rules
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB

-- Build DB with explicit extra constants
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB

Example:

let db = withFactsAndRules
  [ "parent(\"alice\", \"bob\")."
  , "parent(\"bob\", \"carol\")."
  ]
  [ "ancestor(X,Y) :- parent(X,Y)."
  , "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
  ]

4. Rules.hs

Purpose: Processes rules and builds the internal representation.

Key Functions

digestHead - Processes the rule head:

Extracts variables using nub (preserves first-occurrence order)
Creates RuleElement list mapping terms to indices/constants
Collects constants for the Herbrand universe

digestBody - Processes each body literal:

Extends variable list with new variables from body
Creates BodyConstraint linking to sub-relations
Maps terms to RuleElement (index or constant)

Variable Indexing Example:

Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

After digestHead:      variableNames = ["X", "Y"]
After digestBody[0]:   variableNames = ["X", "Y", "Z"]  (Z added)
After digestBody[1]:   variableNames = ["X", "Y", "Z"]  (no change)

Indices: X=0, Y=1, Z=2

addRule - Main entry point:

addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db

5. NaiveQE.hs (Naive Query Engine)

Purpose: Implements naive bottom-up evaluation with fixed-point iteration.

data NaiveQE db = NaiveQE
  { db       :: db
  , herbrand :: Map Text Relation  -- Computed model
  }

Algorithm: `computeHerbrand`

The naive evaluation algorithm:

1. Initialize Facts := all known facts from DB
2. Repeat:
   a. For each rule R in each relation:
      - Enumerate all variable assignments from Herbrand universe
      - For each assignment that satisfies all body constraints:
        - Compute the derived tuple
        - If tuple is new, add to Facts and set changed=True
   b. If no changes, terminate
3. Return final Facts (the minimal Herbrand model)

Complexity: O(n^k) per iteration where n = |constants|, k = |variables in rule|

Query Execution

executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults

Enumerate all assignments for sought variables
For each, check if any assignment to unsought variables satisfies all conditions
Return matching sought-variable tuples

6. DigestedQuery.hs

Purpose: Preprocesses queries for efficient execution.

data DigestedQuery = DigestedQuery
  { allBoundVariables   :: [Text]  -- All variables in query
  , numSoughtVariables  :: Int     -- Variables in output
  , conditions          :: [DigestedQueryCondition]
  }

data DigestedQueryCondition = DigestedQueryCondition
  { relationName :: Text
  , entries      :: [DigestedQueryEntry]
  }

data DigestedQueryEntry
  = DigestedQueryEntryConstant Constant
  | DigestedQueryEntryVariable Int  -- Index into allBoundVariables

Query Forms:

?- p(X,Y). - Output all variables (X, Y)
?- edge(A,B), edge(B,C) → A,C. - Output only A, C (B is existential)

7. QueryEngine.hs

Purpose: Type class for query engines.

class QueryEngine qe where
  queryEngine :: (DatalogDB db) => db -> qe db
  query       :: (DatalogDB db) => qe db -> Text -> Text

8. Utility.hs

Purpose: Helper functions.

-- Generate all possible maps from domain to codomain
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]

Used by NaiveQE to enumerate variable assignments.

Data Flow Example

Input:

parent("alice", "bob").
parent("bob", "carol").
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
?- ancestor(alice, X).

Step 1: Parse → [Fact, Fact, Rule, Rule, Query]

Step 2: Build DB

relations = {
  "parent" -> Relation {
    tuples = {["alice","bob"], ["bob","carol"]},
    rules = []
  },
  "ancestor" -> Relation {
    tuples = {},
    rules = [Rule1, Rule2]
  }
}
constants = {"alice", "bob", "carol"}

Step 3: Compute Herbrand Model

Iteration 1: Apply rules
  ancestor(alice,bob)  ← parent(alice,bob)
  ancestor(bob,carol)  ← parent(bob,carol)

Iteration 2: Apply rules again
  ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)

Iteration 3: No new facts → terminate

Step 4: Execute Query

?- ancestor(alice, X)
Result: bob, carol

Correctness Assessment

Component	Status	Notes
Parser	Correct	Full Datalog syntax support
Data Model	Correct	Proper Relation/Rule representation
Rule Processing	Correct	Correct variable indexing with `nub`
Query Engine	Correct	Fixed-point terminates correctly
Tests	Pass	Comprehensive coverage

Limitations

No stratified negation - Negation is parsed but ignored in evaluation
Naive complexity - Exponential in number of variables per rule
No aggregation - Standard Datalog without extensions
No built-in predicates - No arithmetic comparison, etc.

Test Coverage

Tests are in test/Test/Datalog/:

DatalogParserSpec.hs - Parser correctness
InMemoryDBSpec.hs - Database operations
RulesSpec.hs - Rule digestion
NaiveQESpec.hs - Query evaluation
DigestedQuerySpec.hs - Query preprocessing

Notable Test Cases:

Reflexive relations: equiv(X, X) :- .
Symmetric relations: equiv(Y, X) :- equiv(X, Y).
Transitive closure: equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).
Full equivalence relations (combining all three)
Genealogical queries (niece, sibling relationships)
Poset ordering with constants

Build & Run

cabal build          ## Build
cabal test           ## Run all tests
cabal repl           ## Interactive REPL

## Run specific tests
cabal test --test-option=--match="/NaiveQE/"

Dependencies

megaparsec - Parser combinators
containers - Map, Set
text - Text processing
hspec - Testing framework

Future Work

Potential enhancements:

Implement stratified negation
Add semi-naive evaluation for better performance
Built-in predicates (comparison, arithmetic)
Magic sets optimization
Persistent storage backend

Changelog

Mar 4, 2026 -- The first version was created.

14 KiB Raw Blame History

felix-db: Project Internals

Overview

Architecture

Module Details

1. DatalogParser.hs

Core Types

Syntax Support

Entry Points

2. DatalogDB.hs

Core Types

Type Class

3. InMemoryDB.hs

Builder Functions

4. Rules.hs

Key Functions

5. NaiveQE.hs (Naive Query Engine)

Algorithm: computeHerbrand

Query Execution

6. DigestedQuery.hs

7. QueryEngine.hs

8. Utility.hs

Data Flow Example

Correctness Assessment

Limitations

Test Coverage

Build & Run

Dependencies

Future Work

Changelog

14 KiB

Raw Blame History

Algorithm: `computeHerbrand`