useful-notes/007-cales-vs-felixs.md at 4651b09e2855f9a1e90d863bb675bd0cdab1ff57

habedi-work/useful-notes

Fork 0

Hassan Abedi 4651b09e28 Add comparison of toy-datalog vs felix-db

2026-03-04 13:25:42 +01:00

14 KiB

Raw Blame History

Comparison: toy-datalog vs felix-db

This document compares two Haskell implementations of Datalog interpreters.

Overview

Aspect	toy-datalog	felix-db
Author	Cale Gibbard	(felix-db team)
Version	0.1.0.0	0.1.0.0
Language	Haskell (Haskell2010)	Haskell (GHC2024)
License	BSD-2-Clause	(not specified)
Lines of Code	~300	~700
Primary Focus	Minimal core implementation	More complete feature set

Architecture Comparison

toy-datalog

Program Text → Parser → AST → Database → Eval (fixed-point) → Results

Simple linear pipeline with 4 modules:

Syntax.hs - AST definitions
Parser.hs - Megaparsec parser
Eval.hs - Evaluation engine
IO.hs - File utilities

felix-db

Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
                                    ↓
                              DigestedQuery

More modular architecture with 8+ modules:

DatalogParser.hs - Parser
DatalogDB.hs - Database type class
InMemoryDB.hs - Concrete implementation
Rules.hs - Rule processing/digestion
NaiveQE.hs - Query engine
DigestedQuery.hs - Query preprocessing
QueryEngine.hs - Query engine type class
Utility.hs - Helper functions

Winner: felix-db - Better separation of concerns and extensibility via type classes.

Syntax & AST

Term Representation

Feature	toy-datalog	felix-db
Variables	`Var VarId` (uppercase)	`Var Text` (uppercase)
Constants	`Con ConId` (lowercase/numeric)	`Sym Text` (lowercase/quoted)
Numbers	Parsed as constants	`Num Integer` (separate type)
Newtypes	Yes (`ConId`, `VarId`, `RelId`)	No (raw `Text`)

-- toy-datalog
data Term = Con ConId | Var VarId

-- felix-db
data Term = Var Text | Sym Text | Num Integer

toy-datalog advantage: Newtypes prevent mixing up constants/variables/relations at compile time.

felix-db advantage: Explicit Num type for numeric constants.

Rule Representation

-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]

-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }

felix-db advantage: Supports negation syntax (positive :: Bool).

Parser Comparison

Feature	toy-datalog	felix-db
Library	Megaparsec	Megaparsec
Line comments	`--`	`--`
Block comments	`{- -}`	`/* */`
Rule arrow	`:-` only	`:-`, `→`, `->`
Negation	Not supported	`not` or `!` prefix
Queries	Not supported	`?-` syntax with projection
Quoted strings	No	Yes (`"alice"`)
Numbers	As constants	As separate `Num` type

Example Syntax

-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).

-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.

Winner: felix-db - More complete syntax with queries, negation, and quoted strings.

Data Structures

Relation Storage

-- toy-datalog: Indexed storage
data Relation = Relation
  { _relation_arity   :: Int
  , _relation_members :: Set [ConId]
  , _relation_indices :: Map (Int, ConId) (Set [ConId])  -- Position-based index
  }

-- felix-db: Simple storage + rules
data Relation = Relation
  { name   :: RelationId
  , arity  :: Int
  , tuples :: Set [Constant]  -- No index
  , rules  :: [RelationRule]  -- Rules stored with relation
  }

toy-datalog advantage: Position-based indexing for efficient lookups.

felix-db advantage: Rules stored directly with relations (cleaner organization).

Database Structure

-- toy-datalog
data Database = Database
  { _database_universe  :: Set ConId
  , _database_relations :: Map RelId Relation
  , _database_rules     :: Map RelId [Rule]  -- Rules separate from relations
  }

-- felix-db
data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation  -- Rules inside Relation
  , constants :: Set Constant
  }

Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.

Evaluation Strategy

Both use naive bottom-up evaluation with fixed-point iteration.

Algorithm Comparison

Aspect	toy-datalog	felix-db
Strategy	Naive bottom-up	Naive bottom-up
Termination	Fixed-point check	Fixed-point check
Variable binding	`Map VarId ConId`	List indexed by position
Join operation	`><` operator (set join)	Cartesian product + filter
Index usage	Yes (position-based)	No

toy-datalog Evaluation

-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database

Uses indices to find candidate tuples, then intersects results for bound positions.

felix-db Evaluation

-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]

Generates all possible variable assignments from Herbrand universe, then filters.

Variable Binding Approach

-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value

-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices

toy-datalog: More intuitive, variable names preserved throughout.

felix-db: More efficient (integer indexing), but loses variable names.

Feature Comparison

Feature	toy-datalog	felix-db
Facts	✅	✅
Rules	✅	✅
Recursive rules	✅	✅
Multiple rules per predicate	✅	✅
Arity checking	✅	✅
Fixed-point iteration	✅	✅
Position-based indexing	✅	❌
Interactive queries	❌	✅ (`?-` syntax)
Query projection	❌	✅ (`→ X, Y`)
Negation (parsing)	❌	✅
Negation (evaluation)	❌	❌
Numbers as distinct type	❌	✅
Quoted strings	❌	✅
Pretty printing	✅	❌
Database type class	❌	✅
Query engine type class	❌	✅
PostgreSQL integration	❌	🔄 (in progress)

Error Handling

toy-datalog

data EvalError =
    EvalError_RuleWrongArity Rule WrongArity
  | EvalError_AtomWrongArity Atom WrongArity

data WrongArity = WrongArity
  { _wrongArity_relation :: RelId
  , _wrongArity_expected :: Int
  , _wrongArity_got :: Int
  }

Detailed error types with context.

felix-db

data DatalogException
  = DuplicateRelationException Text
  | ArityMismatchException Text Int Int
  | RelationNotFoundException Text
  | VariableNotFoundException Text

More error types but less context (just error message components).

Winner: toy-datalog - Errors include the full rule/atom that caused the problem.

Testing

Aspect	toy-datalog	felix-db
Framework	Tasty + HUnit	Hspec
Parser tests	✅ Golden tests	✅ Unit tests
Evaluation tests	❌ (commented out)	✅ Comprehensive
Property tests	❌	❌
Test files	2	7

felix-db Test Examples

-- Reflexive relation
"equiv(X, X) :- ."

-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."

-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."

-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."

Winner: felix-db - Much more comprehensive test coverage.

Performance Characteristics

Indexing

Operation	toy-datalog	felix-db
Lookup atom with constants	O(log n) via index intersection	O(m) full scan
Lookup atom with all variables	O(m) full scan	O(m) full scan
Insert tuple	O(k log n) (update k indices)	O(log m)

toy-datalog advantage: Indexed lookups for atoms with bound positions.

Evaluation Complexity

Both have the same theoretical complexity for naive evaluation:

Time per iteration: O(n^k × |rules| × r)
- n = Herbrand universe size
- k = max variables per rule
- r = max body literals

However, toy-datalog's indexing provides practical speedup when atoms have constants.

Memory Usage

Aspect	toy-datalog	felix-db
Tuple storage	Set [ConId]	Set [Constant]
Index overhead	O(k × m) extra	None
Rule storage	Separate map	In relations

toy-datalog uses more memory due to indices, but gains query performance.

Code Quality

toy-datalog

Strengths:

Minimal, focused implementation
Good use of newtypes for type safety
Pretty printing support
Clean HasConstants typeclass

Weaknesses:

No executable (commented out)
Evaluation tests disabled
No query interface

felix-db

Strengths:

Comprehensive test suite
Type class abstraction (DatalogDB, QueryEngine)
Query syntax with projection
More complete feature set
PostgreSQL integration in progress

Weaknesses:

No position-based indexing
Raw Text instead of newtypes
No pretty printing
More complex codebase

Use Case Recommendations

Use toy-datalog when:

Learning Datalog evaluation fundamentals
Need indexed lookups for performance
Want minimal, readable implementation
Building something from scratch

Use felix-db when:

Need query syntax (?-)
Want comprehensive tests as reference
Planning to extend with database backends
Need negation syntax (even if not evaluated)

Summary Table

Category	Winner	Notes
Type Safety	toy-datalog	Newtypes prevent errors
Features	felix-db	Queries, negation syntax, numbers
Indexing	toy-datalog	Position-based index
Testing	felix-db	7 test files vs 1 active
Extensibility	felix-db	Type classes for DB/QE
Code Clarity	tie	Both well-organized
Error Messages	toy-datalog	Full context in errors
Documentation	toy-datalog	NOTES.md, CHANGELOG
Production Ready	neither	Both need work

Potential Cross-Pollination

toy-datalog could adopt from felix-db:

Query syntax (?- with projection)
Type class abstraction for database backend
Comprehensive test suite
Negation parsing (for future implementation)
Separate Num type for integers

felix-db could adopt from toy-datalog:

Position-based indexing for performance
Newtypes (ConId, VarId, RelId) for type safety
Pretty printing (Pretty typeclass)
Richer error types with context
HasConstants typeclass pattern

Conclusion

Both implementations correctly handle core positive Datalog with naive bottom-up evaluation. They represent different design tradeoffs:

toy-datalog prioritizes type safety and indexing with a minimal codebase
felix-db prioritizes features and extensibility with a more complete implementation

Neither supports negation evaluation, aggregation, or semi-naive evaluation. For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.

Changelog

Mar 4, 2026 -- The first version was created.

14 KiB Raw Blame History Unescape Escape

Comparison: toy-datalog vs felix-db

Overview

Architecture Comparison

toy-datalog

felix-db

Syntax & AST

Term Representation

Rule Representation

Parser Comparison

Example Syntax

Data Structures

Relation Storage

Database Structure

Evaluation Strategy

Algorithm Comparison

toy-datalog Evaluation

felix-db Evaluation

Variable Binding Approach

Feature Comparison

Error Handling

toy-datalog

felix-db

Testing

felix-db Test Examples

Performance Characteristics

Indexing

Evaluation Complexity

Memory Usage

Code Quality

toy-datalog

felix-db

Use Case Recommendations

Use toy-datalog when:

Use felix-db when:

Summary Table

Potential Cross-Pollination

toy-datalog could adopt from felix-db:

felix-db could adopt from toy-datalog:

Conclusion

Changelog

14 KiB

Raw Blame History