useful-notes/hassan/007-cales-vs-felixs.md
2026-03-04 13:25:42 +01:00

14 KiB
Raw Blame History

Comparison: toy-datalog vs felix-db

This document compares two Haskell implementations of Datalog interpreters.


Overview

Aspect toy-datalog felix-db
Author Cale Gibbard (felix-db team)
Version 0.1.0.0 0.1.0.0
Language Haskell (Haskell2010) Haskell (GHC2024)
License BSD-2-Clause (not specified)
Lines of Code ~300 ~700
Primary Focus Minimal core implementation More complete feature set

Architecture Comparison

toy-datalog

Program Text → Parser → AST → Database → Eval (fixed-point) → Results

Simple linear pipeline with 4 modules:

  • Syntax.hs - AST definitions
  • Parser.hs - Megaparsec parser
  • Eval.hs - Evaluation engine
  • IO.hs - File utilities

felix-db

Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
                                    ↓
                              DigestedQuery

More modular architecture with 8+ modules:

  • DatalogParser.hs - Parser
  • DatalogDB.hs - Database type class
  • InMemoryDB.hs - Concrete implementation
  • Rules.hs - Rule processing/digestion
  • NaiveQE.hs - Query engine
  • DigestedQuery.hs - Query preprocessing
  • QueryEngine.hs - Query engine type class
  • Utility.hs - Helper functions

Winner: felix-db - Better separation of concerns and extensibility via type classes.


Syntax & AST

Term Representation

Feature toy-datalog felix-db
Variables Var VarId (uppercase) Var Text (uppercase)
Constants Con ConId (lowercase/numeric) Sym Text (lowercase/quoted)
Numbers Parsed as constants Num Integer (separate type)
Newtypes Yes (ConId, VarId, RelId) No (raw Text)
-- toy-datalog
data Term = Con ConId | Var VarId

-- felix-db
data Term = Var Text | Sym Text | Num Integer

toy-datalog advantage: Newtypes prevent mixing up constants/variables/relations at compile time.

felix-db advantage: Explicit Num type for numeric constants.

Rule Representation

-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]

-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }

felix-db advantage: Supports negation syntax (positive :: Bool).


Parser Comparison

Feature toy-datalog felix-db
Library Megaparsec Megaparsec
Line comments -- --
Block comments {- -} /* */
Rule arrow :- only :-, , ->
Negation Not supported not or ! prefix
Queries Not supported ?- syntax with projection
Quoted strings No Yes ("alice")
Numbers As constants As separate Num type

Example Syntax

-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).

-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.

Winner: felix-db - More complete syntax with queries, negation, and quoted strings.


Data Structures

Relation Storage

-- toy-datalog: Indexed storage
data Relation = Relation
  { _relation_arity   :: Int
  , _relation_members :: Set [ConId]
  , _relation_indices :: Map (Int, ConId) (Set [ConId])  -- Position-based index
  }

-- felix-db: Simple storage + rules
data Relation = Relation
  { name   :: RelationId
  , arity  :: Int
  , tuples :: Set [Constant]  -- No index
  , rules  :: [RelationRule]  -- Rules stored with relation
  }

toy-datalog advantage: Position-based indexing for efficient lookups.

felix-db advantage: Rules stored directly with relations (cleaner organization).

Database Structure

-- toy-datalog
data Database = Database
  { _database_universe  :: Set ConId
  , _database_relations :: Map RelId Relation
  , _database_rules     :: Map RelId [Rule]  -- Rules separate from relations
  }

-- felix-db
data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation  -- Rules inside Relation
  , constants :: Set Constant
  }

Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.


Evaluation Strategy

Both use naive bottom-up evaluation with fixed-point iteration.

Algorithm Comparison

Aspect toy-datalog felix-db
Strategy Naive bottom-up Naive bottom-up
Termination Fixed-point check Fixed-point check
Variable binding Map VarId ConId List indexed by position
Join operation >< operator (set join) Cartesian product + filter
Index usage Yes (position-based) No

toy-datalog Evaluation

-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database

Uses indices to find candidate tuples, then intersects results for bound positions.

felix-db Evaluation

-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]

Generates all possible variable assignments from Herbrand universe, then filters.

Variable Binding Approach

-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value

-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices

toy-datalog: More intuitive, variable names preserved throughout.

felix-db: More efficient (integer indexing), but loses variable names.


Feature Comparison

Feature toy-datalog felix-db
Facts
Rules
Recursive rules
Multiple rules per predicate
Arity checking
Fixed-point iteration
Position-based indexing
Interactive queries (?- syntax)
Query projection (→ X, Y)
Negation (parsing)
Negation (evaluation)
Numbers as distinct type
Quoted strings
Pretty printing
Database type class
Query engine type class
PostgreSQL integration 🔄 (in progress)

Error Handling

toy-datalog

data EvalError =
    EvalError_RuleWrongArity Rule WrongArity
  | EvalError_AtomWrongArity Atom WrongArity

data WrongArity = WrongArity
  { _wrongArity_relation :: RelId
  , _wrongArity_expected :: Int
  , _wrongArity_got :: Int
  }

Detailed error types with context.

felix-db

data DatalogException
  = DuplicateRelationException Text
  | ArityMismatchException Text Int Int
  | RelationNotFoundException Text
  | VariableNotFoundException Text

More error types but less context (just error message components).

Winner: toy-datalog - Errors include the full rule/atom that caused the problem.


Testing

Aspect toy-datalog felix-db
Framework Tasty + HUnit Hspec
Parser tests Golden tests Unit tests
Evaluation tests (commented out) Comprehensive
Property tests
Test files 2 7

felix-db Test Examples

-- Reflexive relation
"equiv(X, X) :- ."

-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."

-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."

-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."

Winner: felix-db - Much more comprehensive test coverage.


Performance Characteristics

Indexing

Operation toy-datalog felix-db
Lookup atom with constants O(log n) via index intersection O(m) full scan
Lookup atom with all variables O(m) full scan O(m) full scan
Insert tuple O(k log n) (update k indices) O(log m)

toy-datalog advantage: Indexed lookups for atoms with bound positions.

Evaluation Complexity

Both have the same theoretical complexity for naive evaluation:

  • Time per iteration: O(n^k × |rules| × r)
    • n = Herbrand universe size
    • k = max variables per rule
    • r = max body literals

However, toy-datalog's indexing provides practical speedup when atoms have constants.

Memory Usage

Aspect toy-datalog felix-db
Tuple storage Set [ConId] Set [Constant]
Index overhead O(k × m) extra None
Rule storage Separate map In relations

toy-datalog uses more memory due to indices, but gains query performance.


Code Quality

toy-datalog

Strengths:

  • Minimal, focused implementation
  • Good use of newtypes for type safety
  • Pretty printing support
  • Clean HasConstants typeclass

Weaknesses:

  • No executable (commented out)
  • Evaluation tests disabled
  • No query interface

felix-db

Strengths:

  • Comprehensive test suite
  • Type class abstraction (DatalogDB, QueryEngine)
  • Query syntax with projection
  • More complete feature set
  • PostgreSQL integration in progress

Weaknesses:

  • No position-based indexing
  • Raw Text instead of newtypes
  • No pretty printing
  • More complex codebase

Use Case Recommendations

Use toy-datalog when:

  • Learning Datalog evaluation fundamentals
  • Need indexed lookups for performance
  • Want minimal, readable implementation
  • Building something from scratch

Use felix-db when:

  • Need query syntax (?-)
  • Want comprehensive tests as reference
  • Planning to extend with database backends
  • Need negation syntax (even if not evaluated)

Summary Table

Category Winner Notes
Type Safety toy-datalog Newtypes prevent errors
Features felix-db Queries, negation syntax, numbers
Indexing toy-datalog Position-based index
Testing felix-db 7 test files vs 1 active
Extensibility felix-db Type classes for DB/QE
Code Clarity tie Both well-organized
Error Messages toy-datalog Full context in errors
Documentation toy-datalog NOTES.md, CHANGELOG
Production Ready neither Both need work

Potential Cross-Pollination

toy-datalog could adopt from felix-db:

  1. Query syntax (?- with projection)
  2. Type class abstraction for database backend
  3. Comprehensive test suite
  4. Negation parsing (for future implementation)
  5. Separate Num type for integers

felix-db could adopt from toy-datalog:

  1. Position-based indexing for performance
  2. Newtypes (ConId, VarId, RelId) for type safety
  3. Pretty printing (Pretty typeclass)
  4. Richer error types with context
  5. HasConstants typeclass pattern

Conclusion

Both implementations correctly handle core positive Datalog with naive bottom-up evaluation. They represent different design tradeoffs:

  • toy-datalog prioritizes type safety and indexing with a minimal codebase
  • felix-db prioritizes features and extensibility with a more complete implementation

Neither supports negation evaluation, aggregation, or semi-naive evaluation. For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.

Changelog

  • Mar 4, 2026 -- The first version was created.