## Comparison: toy-datalog vs felix-db This document compares two Haskell implementations of Datalog interpreters. --- ### Overview | Aspect | toy-datalog | felix-db | |-------------------|-----------------------------|---------------------------| | **Author** | Cale Gibbard | (felix-db team) | | **Version** | 0.1.0.0 | 0.1.0.0 | | **Language** | Haskell (Haskell2010) | Haskell (GHC2024) | | **License** | BSD-2-Clause | (not specified) | | **Lines of Code** | ~300 | ~700 | | **Primary Focus** | Minimal core implementation | More complete feature set | --- ### Architecture Comparison #### toy-datalog ``` Program Text → Parser → AST → Database → Eval (fixed-point) → Results ``` Simple linear pipeline with 4 modules: - `Syntax.hs` - AST definitions - `Parser.hs` - Megaparsec parser - `Eval.hs` - Evaluation engine - `IO.hs` - File utilities #### felix-db ``` Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results ↓ DigestedQuery ``` More modular architecture with 8+ modules: - `DatalogParser.hs` - Parser - `DatalogDB.hs` - Database type class - `InMemoryDB.hs` - Concrete implementation - `Rules.hs` - Rule processing/digestion - `NaiveQE.hs` - Query engine - `DigestedQuery.hs` - Query preprocessing - `QueryEngine.hs` - Query engine type class - `Utility.hs` - Helper functions **Winner: felix-db** - Better separation of concerns and extensibility via type classes. --- ### Syntax & AST #### Term Representation | Feature | toy-datalog | felix-db | |-----------|---------------------------------|-------------------------------| | Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) | | Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) | | Numbers | Parsed as constants | `Num Integer` (separate type) | | Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) | ```haskell -- toy-datalog data Term = Con ConId | Var VarId -- felix-db data Term = Var Text | Sym Text | Num Integer ``` **toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time. **felix-db advantage:** Explicit `Num` type for numeric constants. #### Rule Representation ```haskell -- toy-datalog: Simple, direct data Rule = Atom :- [Atom] -- felix-db: Separate head type, supports negation in body data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal] data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] } ``` **felix-db advantage:** Supports negation syntax (`positive :: Bool`). --- ### Parser Comparison | Feature | toy-datalog | felix-db | |----------------|---------------|-----------------------------| | Library | Megaparsec | Megaparsec | | Line comments | `--` | `--` | | Block comments | `{- -}` | `/* */` | | Rule arrow | `:-` only | `:-`, `→`, `->` | | Negation | Not supported | `not` or `!` prefix | | Queries | Not supported | `?-` syntax with projection | | Quoted strings | No | Yes (`"alice"`) | | Numbers | As constants | As separate `Num` type | #### Example Syntax ```datalog -- toy-datalog parent(xerces, brooke). ancestor(X, Y) :- parent(X, Y). -- felix-db (additional features) parent("alice", "bob"). ancestor(X, Y) :- parent(X, Y), not sibling(X, Y). ?- ancestor(A, B), ancestor(B, C) → A, C. ``` **Winner: felix-db** - More complete syntax with queries, negation, and quoted strings. --- ### Data Structures #### Relation Storage ```haskell -- toy-datalog: Indexed storage data Relation = Relation { _relation_arity :: Int , _relation_members :: Set [ConId] , _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index } -- felix-db: Simple storage + rules data Relation = Relation { name :: RelationId , arity :: Int , tuples :: Set [Constant] -- No index , rules :: [RelationRule] -- Rules stored with relation } ``` **toy-datalog advantage:** Position-based indexing for efficient lookups. **felix-db advantage:** Rules stored directly with relations (cleaner organization). #### Database Structure ```haskell -- toy-datalog data Database = Database { _database_universe :: Set ConId , _database_relations :: Map RelId Relation , _database_rules :: Map RelId [Rule] -- Rules separate from relations } -- felix-db data InMemoryDB = InMemoryDB { relations :: Map RelationId Relation -- Rules inside Relation , constants :: Set Constant } ``` Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations. --- ### Evaluation Strategy Both use **naive bottom-up evaluation** with fixed-point iteration. #### Algorithm Comparison | Aspect | toy-datalog | felix-db | |------------------|--------------------------|----------------------------| | Strategy | Naive bottom-up | Naive bottom-up | | Termination | Fixed-point check | Fixed-point check | | Variable binding | `Map VarId ConId` | List indexed by position | | Join operation | `><` operator (set join) | Cartesian product + filter | | Index usage | Yes (position-based) | No | #### toy-datalog Evaluation ```haskell -- Key functions evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId)) evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId)) immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId]) extendFixedpointDb :: Database -> Either EvalError Database ``` Uses indices to find candidate tuples, then intersects results for bound positions. #### felix-db Evaluation ```haskell -- Key functions computeHerbrand :: (DatalogDB db) => db -> Map Text Relation executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]] ``` Generates all possible variable assignments from Herbrand universe, then filters. #### Variable Binding Approach ```haskell -- toy-datalog: Named bindings (Map) mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId) -- Checks consistency: same variable must bind to same value -- felix-db: Positional bindings (List + Index) data RuleElement = RuleElementConstant Constant | RuleElementVariable Int -- Variables replaced with de Bruijn-style indices ``` **toy-datalog:** More intuitive, variable names preserved throughout. **felix-db:** More efficient (integer indexing), but loses variable names. --- ### Feature Comparison | Feature | toy-datalog | felix-db | |------------------------------|-------------|------------------| | Facts | ✅ | ✅ | | Rules | ✅ | ✅ | | Recursive rules | ✅ | ✅ | | Multiple rules per predicate | ✅ | ✅ | | Arity checking | ✅ | ✅ | | Fixed-point iteration | ✅ | ✅ | | Position-based indexing | ✅ | ❌ | | Interactive queries | ❌ | ✅ (`?-` syntax) | | Query projection | ❌ | ✅ (`→ X, Y`) | | Negation (parsing) | ❌ | ✅ | | Negation (evaluation) | ❌ | ❌ | | Numbers as distinct type | ❌ | ✅ | | Quoted strings | ❌ | ✅ | | Pretty printing | ✅ | ❌ | | Database type class | ❌ | ✅ | | Query engine type class | ❌ | ✅ | | PostgreSQL integration | ❌ | 🔄 (in progress) | --- ### Error Handling #### toy-datalog ```haskell data EvalError = EvalError_RuleWrongArity Rule WrongArity | EvalError_AtomWrongArity Atom WrongArity data WrongArity = WrongArity { _wrongArity_relation :: RelId , _wrongArity_expected :: Int , _wrongArity_got :: Int } ``` Detailed error types with context. #### felix-db ```haskell data DatalogException = DuplicateRelationException Text | ArityMismatchException Text Int Int | RelationNotFoundException Text | VariableNotFoundException Text ``` More error types but less context (just error message components). **Winner: toy-datalog** - Errors include the full rule/atom that caused the problem. --- ### Testing | Aspect | toy-datalog | felix-db | |------------------|-------------------|-----------------| | Framework | Tasty + HUnit | Hspec | | Parser tests | ✅ Golden tests | ✅ Unit tests | | Evaluation tests | ❌ (commented out) | ✅ Comprehensive | | Property tests | ❌ | ❌ | | Test files | 2 | 7 | #### felix-db Test Examples ```haskell -- Reflexive relation "equiv(X, X) :- ." -- Symmetric closure "equiv(Y, X) :- equiv(X, Y)." -- Transitive closure "equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)." -- Complex genealogical queries "niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)." ``` **Winner: felix-db** - Much more comprehensive test coverage. --- ### Performance Characteristics #### Indexing | Operation | toy-datalog | felix-db | |--------------------------------|---------------------------------|----------------| | Lookup atom with constants | O(log n) via index intersection | O(m) full scan | | Lookup atom with all variables | O(m) full scan | O(m) full scan | | Insert tuple | O(k log n) (update k indices) | O(log m) | **toy-datalog advantage:** Indexed lookups for atoms with bound positions. #### Evaluation Complexity Both have the same theoretical complexity for naive evaluation: - **Time per iteration:** O(n^k × |rules| × r) - n = Herbrand universe size - k = max variables per rule - r = max body literals However, toy-datalog's indexing provides practical speedup when atoms have constants. #### Memory Usage | Aspect | toy-datalog | felix-db | |----------------|----------------|----------------| | Tuple storage | Set [ConId] | Set [Constant] | | Index overhead | O(k × m) extra | None | | Rule storage | Separate map | In relations | toy-datalog uses more memory due to indices, but gains query performance. --- ### Code Quality #### toy-datalog **Strengths:** - Minimal, focused implementation - Good use of newtypes for type safety - Pretty printing support - Clean `HasConstants` typeclass **Weaknesses:** - No executable (commented out) - Evaluation tests disabled - No query interface #### felix-db **Strengths:** - Comprehensive test suite - Type class abstraction (`DatalogDB`, `QueryEngine`) - Query syntax with projection - More complete feature set - PostgreSQL integration in progress **Weaknesses:** - No position-based indexing - Raw `Text` instead of newtypes - No pretty printing - More complex codebase --- ### Use Case Recommendations #### Use toy-datalog when: - Learning Datalog evaluation fundamentals - Need indexed lookups for performance - Want minimal, readable implementation - Building something from scratch #### Use felix-db when: - Need query syntax (`?-`) - Want comprehensive tests as reference - Planning to extend with database backends - Need negation syntax (even if not evaluated) --- ### Summary Table | Category | Winner | Notes | |------------------|-------------|-----------------------------------| | Type Safety | toy-datalog | Newtypes prevent errors | | Features | felix-db | Queries, negation syntax, numbers | | Indexing | toy-datalog | Position-based index | | Testing | felix-db | 7 test files vs 1 active | | Extensibility | felix-db | Type classes for DB/QE | | Code Clarity | tie | Both well-organized | | Error Messages | toy-datalog | Full context in errors | | Documentation | toy-datalog | NOTES.md, CHANGELOG | | Production Ready | neither | Both need work | --- ### Potential Cross-Pollination #### toy-datalog could adopt from felix-db: 1. Query syntax (`?-` with projection) 2. Type class abstraction for database backend 3. Comprehensive test suite 4. Negation parsing (for future implementation) 5. Separate `Num` type for integers #### felix-db could adopt from toy-datalog: 1. Position-based indexing for performance 2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety 3. Pretty printing (`Pretty` typeclass) 4. Richer error types with context 5. `HasConstants` typeclass pattern --- ### Conclusion Both implementations correctly handle core positive Datalog with naive bottom-up evaluation. They represent different design tradeoffs: - **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase - **felix-db** prioritizes **features and extensibility** with a more complete implementation Neither supports negation evaluation, aggregation, or semi-naive evaluation. For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal. ## Changelog * **Mar 4, 2026** -- The first version was created.