14 KiB
Comparison: toy-datalog vs felix-db
This document compares two Haskell implementations of Datalog interpreters.
Overview
| Aspect | toy-datalog | felix-db |
|---|---|---|
| Author | Cale Gibbard | (felix-db team) |
| Version | 0.1.0.0 | 0.1.0.0 |
| Language | Haskell (Haskell2010) | Haskell (GHC2024) |
| License | BSD-2-Clause | (not specified) |
| Lines of Code | ~300 | ~700 |
| Primary Focus | Minimal core implementation | More complete feature set |
Architecture Comparison
toy-datalog
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
Simple linear pipeline with 4 modules:
Syntax.hs- AST definitionsParser.hs- Megaparsec parserEval.hs- Evaluation engineIO.hs- File utilities
felix-db
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
↓
DigestedQuery
More modular architecture with 8+ modules:
DatalogParser.hs- ParserDatalogDB.hs- Database type classInMemoryDB.hs- Concrete implementationRules.hs- Rule processing/digestionNaiveQE.hs- Query engineDigestedQuery.hs- Query preprocessingQueryEngine.hs- Query engine type classUtility.hs- Helper functions
Winner: felix-db - Better separation of concerns and extensibility via type classes.
Syntax & AST
Term Representation
| Feature | toy-datalog | felix-db |
|---|---|---|
| Variables | Var VarId (uppercase) |
Var Text (uppercase) |
| Constants | Con ConId (lowercase/numeric) |
Sym Text (lowercase/quoted) |
| Numbers | Parsed as constants | Num Integer (separate type) |
| Newtypes | Yes (ConId, VarId, RelId) |
No (raw Text) |
-- toy-datalog
data Term = Con ConId | Var VarId
-- felix-db
data Term = Var Text | Sym Text | Num Integer
toy-datalog advantage: Newtypes prevent mixing up constants/variables/relations at compile time.
felix-db advantage: Explicit Num type for numeric constants.
Rule Representation
-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]
-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
felix-db advantage: Supports negation syntax (positive :: Bool).
Parser Comparison
| Feature | toy-datalog | felix-db |
|---|---|---|
| Library | Megaparsec | Megaparsec |
| Line comments | -- |
-- |
| Block comments | {- -} |
/* */ |
| Rule arrow | :- only |
:-, →, -> |
| Negation | Not supported | not or ! prefix |
| Queries | Not supported | ?- syntax with projection |
| Quoted strings | No | Yes ("alice") |
| Numbers | As constants | As separate Num type |
Example Syntax
-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).
-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.
Winner: felix-db - More complete syntax with queries, negation, and quoted strings.
Data Structures
Relation Storage
-- toy-datalog: Indexed storage
data Relation = Relation
{ _relation_arity :: Int
, _relation_members :: Set [ConId]
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
}
-- felix-db: Simple storage + rules
data Relation = Relation
{ name :: RelationId
, arity :: Int
, tuples :: Set [Constant] -- No index
, rules :: [RelationRule] -- Rules stored with relation
}
toy-datalog advantage: Position-based indexing for efficient lookups.
felix-db advantage: Rules stored directly with relations (cleaner organization).
Database Structure
-- toy-datalog
data Database = Database
{ _database_universe :: Set ConId
, _database_relations :: Map RelId Relation
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
}
-- felix-db
data InMemoryDB = InMemoryDB
{ relations :: Map RelationId Relation -- Rules inside Relation
, constants :: Set Constant
}
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
Evaluation Strategy
Both use naive bottom-up evaluation with fixed-point iteration.
Algorithm Comparison
| Aspect | toy-datalog | felix-db |
|---|---|---|
| Strategy | Naive bottom-up | Naive bottom-up |
| Termination | Fixed-point check | Fixed-point check |
| Variable binding | Map VarId ConId |
List indexed by position |
| Join operation | >< operator (set join) |
Cartesian product + filter |
| Index usage | Yes (position-based) | No |
toy-datalog Evaluation
-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database
Uses indices to find candidate tuples, then intersects results for bound positions.
felix-db Evaluation
-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
Generates all possible variable assignments from Herbrand universe, then filters.
Variable Binding Approach
-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value
-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices
toy-datalog: More intuitive, variable names preserved throughout.
felix-db: More efficient (integer indexing), but loses variable names.
Feature Comparison
| Feature | toy-datalog | felix-db |
|---|---|---|
| Facts | ✅ | ✅ |
| Rules | ✅ | ✅ |
| Recursive rules | ✅ | ✅ |
| Multiple rules per predicate | ✅ | ✅ |
| Arity checking | ✅ | ✅ |
| Fixed-point iteration | ✅ | ✅ |
| Position-based indexing | ✅ | ❌ |
| Interactive queries | ❌ | ✅ (?- syntax) |
| Query projection | ❌ | ✅ (→ X, Y) |
| Negation (parsing) | ❌ | ✅ |
| Negation (evaluation) | ❌ | ❌ |
| Numbers as distinct type | ❌ | ✅ |
| Quoted strings | ❌ | ✅ |
| Pretty printing | ✅ | ❌ |
| Database type class | ❌ | ✅ |
| Query engine type class | ❌ | ✅ |
| PostgreSQL integration | ❌ | 🔄 (in progress) |
Error Handling
toy-datalog
data EvalError =
EvalError_RuleWrongArity Rule WrongArity
| EvalError_AtomWrongArity Atom WrongArity
data WrongArity = WrongArity
{ _wrongArity_relation :: RelId
, _wrongArity_expected :: Int
, _wrongArity_got :: Int
}
Detailed error types with context.
felix-db
data DatalogException
= DuplicateRelationException Text
| ArityMismatchException Text Int Int
| RelationNotFoundException Text
| VariableNotFoundException Text
More error types but less context (just error message components).
Winner: toy-datalog - Errors include the full rule/atom that caused the problem.
Testing
| Aspect | toy-datalog | felix-db |
|---|---|---|
| Framework | Tasty + HUnit | Hspec |
| Parser tests | ✅ Golden tests | ✅ Unit tests |
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
| Property tests | ❌ | ❌ |
| Test files | 2 | 7 |
felix-db Test Examples
-- Reflexive relation
"equiv(X, X) :- ."
-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."
-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
Winner: felix-db - Much more comprehensive test coverage.
Performance Characteristics
Indexing
| Operation | toy-datalog | felix-db |
|---|---|---|
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
| Insert tuple | O(k log n) (update k indices) | O(log m) |
toy-datalog advantage: Indexed lookups for atoms with bound positions.
Evaluation Complexity
Both have the same theoretical complexity for naive evaluation:
- Time per iteration: O(n^k × |rules| × r)
- n = Herbrand universe size
- k = max variables per rule
- r = max body literals
However, toy-datalog's indexing provides practical speedup when atoms have constants.
Memory Usage
| Aspect | toy-datalog | felix-db |
|---|---|---|
| Tuple storage | Set [ConId] | Set [Constant] |
| Index overhead | O(k × m) extra | None |
| Rule storage | Separate map | In relations |
toy-datalog uses more memory due to indices, but gains query performance.
Code Quality
toy-datalog
Strengths:
- Minimal, focused implementation
- Good use of newtypes for type safety
- Pretty printing support
- Clean
HasConstantstypeclass
Weaknesses:
- No executable (commented out)
- Evaluation tests disabled
- No query interface
felix-db
Strengths:
- Comprehensive test suite
- Type class abstraction (
DatalogDB,QueryEngine) - Query syntax with projection
- More complete feature set
- PostgreSQL integration in progress
Weaknesses:
- No position-based indexing
- Raw
Textinstead of newtypes - No pretty printing
- More complex codebase
Use Case Recommendations
Use toy-datalog when:
- Learning Datalog evaluation fundamentals
- Need indexed lookups for performance
- Want minimal, readable implementation
- Building something from scratch
Use felix-db when:
- Need query syntax (
?-) - Want comprehensive tests as reference
- Planning to extend with database backends
- Need negation syntax (even if not evaluated)
Summary Table
| Category | Winner | Notes |
|---|---|---|
| Type Safety | toy-datalog | Newtypes prevent errors |
| Features | felix-db | Queries, negation syntax, numbers |
| Indexing | toy-datalog | Position-based index |
| Testing | felix-db | 7 test files vs 1 active |
| Extensibility | felix-db | Type classes for DB/QE |
| Code Clarity | tie | Both well-organized |
| Error Messages | toy-datalog | Full context in errors |
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
| Production Ready | neither | Both need work |
Potential Cross-Pollination
toy-datalog could adopt from felix-db:
- Query syntax (
?-with projection) - Type class abstraction for database backend
- Comprehensive test suite
- Negation parsing (for future implementation)
- Separate
Numtype for integers
felix-db could adopt from toy-datalog:
- Position-based indexing for performance
- Newtypes (
ConId,VarId,RelId) for type safety - Pretty printing (
Prettytypeclass) - Richer error types with context
HasConstantstypeclass pattern
Conclusion
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation. They represent different design tradeoffs:
- toy-datalog prioritizes type safety and indexing with a minimal codebase
- felix-db prioritizes features and extensibility with a more complete implementation
Neither supports negation evaluation, aggregation, or semi-naive evaluation. For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
Changelog
- Mar 4, 2026 -- The first version was created.