Add comparison of toy-datalog vs felix-db

2026-03-04 13:25:42 +01:00 · 2026-03-04 13:25:42 +01:00 · 4651b09e28
commit 4651b09e28
parent 929c71488a
1 changed files with 452 additions and 0 deletions
--- a/hassan/007-cales-vs-felixs.md
+++ b/hassan/007-cales-vs-felixs.md
@ -0,0 +1,452 @@
 ## Comparison: toy-datalog vs felix-db
 This document compares two Haskell implementations of Datalog interpreters.
 ---
 ### Overview
 | Aspect            | toy-datalog                 | felix-db                  |
 |-------------------|-----------------------------|---------------------------|
 | **Author**        | Cale Gibbard                | (felix-db team)           |
 | **Version**       | 0.1.0.0                     | 0.1.0.0                   |
 | **Language**      | Haskell (Haskell2010)       | Haskell (GHC2024)         |
 | **License**       | BSD-2-Clause                | (not specified)           |
 | **Lines of Code** | ~300                        | ~700                      |
 | **Primary Focus** | Minimal core implementation | More complete feature set |
 ---
 ### Architecture Comparison
 #### toy-datalog
 ```
 Program Text → Parser → AST → Database → Eval (fixed-point) → Results
 ```
 Simple linear pipeline with 4 modules:
 - `Syntax.hs` - AST definitions
 - `Parser.hs` - Megaparsec parser
 - `Eval.hs` - Evaluation engine
 - `IO.hs` - File utilities
 #### felix-db
 ```
 Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
                                    ↓
                              DigestedQuery
 ```
 More modular architecture with 8+ modules:
 - `DatalogParser.hs` - Parser
 - `DatalogDB.hs` - Database type class
 - `InMemoryDB.hs` - Concrete implementation
 - `Rules.hs` - Rule processing/digestion
 - `NaiveQE.hs` - Query engine
 - `DigestedQuery.hs` - Query preprocessing
 - `QueryEngine.hs` - Query engine type class
 - `Utility.hs` - Helper functions
 **Winner: felix-db** - Better separation of concerns and extensibility via type classes.
 ---
 ### Syntax & AST
 #### Term Representation
 | Feature   | toy-datalog                     | felix-db                      |
 |-----------|---------------------------------|-------------------------------|
 | Variables | `Var VarId` (uppercase)         | `Var Text` (uppercase)        |
 | Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
 | Numbers   | Parsed as constants             | `Num Integer` (separate type) |
 | Newtypes  | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`)               |
 ```haskell
 -- toy-datalog
 data Term = Con ConId | Var VarId
 -- felix-db
 data Term = Var Text | Sym Text | Num Integer
 ```
 **toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
 **felix-db advantage:** Explicit `Num` type for numeric constants.
 #### Rule Representation
 ```haskell
 -- toy-datalog: Simple, direct
 data Rule = Atom :- [Atom]
 -- felix-db: Separate head type, supports negation in body
 data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
 data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
 ```
 **felix-db advantage:** Supports negation syntax (`positive :: Bool`).
 ---
 ### Parser Comparison
 | Feature        | toy-datalog   | felix-db                    |
 |----------------|---------------|-----------------------------|
 | Library        | Megaparsec    | Megaparsec                  |
 | Line comments  | `--`          | `--`                        |
 | Block comments | `{- -}`       | `/* */`                     |
 | Rule arrow     | `:-` only     | `:-`, `→`, `->`             |
 | Negation       | Not supported | `not` or `!` prefix         |
 | Queries        | Not supported | `?-` syntax with projection |
 | Quoted strings | No            | Yes (`"alice"`)             |
 | Numbers        | As constants  | As separate `Num` type      |
 #### Example Syntax
 ```datalog
 -- toy-datalog
 parent(xerces, brooke).
 ancestor(X, Y) :- parent(X, Y).
 -- felix-db (additional features)
 parent("alice", "bob").
 ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
 ?- ancestor(A, B), ancestor(B, C) → A, C.
 ```
 **Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
 ---
 ### Data Structures
 #### Relation Storage
 ```haskell
 -- toy-datalog: Indexed storage
 data Relation = Relation
  { _relation_arity   :: Int
  , _relation_members :: Set [ConId]
  , _relation_indices :: Map (Int, ConId) (Set [ConId])  -- Position-based index
  }
 -- felix-db: Simple storage + rules
 data Relation = Relation
  { name   :: RelationId
  , arity  :: Int
  , tuples :: Set [Constant]  -- No index
  , rules  :: [RelationRule]  -- Rules stored with relation
  }
 ```
 **toy-datalog advantage:** Position-based indexing for efficient lookups.
 **felix-db advantage:** Rules stored directly with relations (cleaner organization).
 #### Database Structure
 ```haskell
 -- toy-datalog
 data Database = Database
  { _database_universe  :: Set ConId
  , _database_relations :: Map RelId Relation
  , _database_rules     :: Map RelId [Rule]  -- Rules separate from relations
  }
 -- felix-db
 data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation  -- Rules inside Relation
  , constants :: Set Constant
  }
 ```
 Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
 ---
 ### Evaluation Strategy
 Both use **naive bottom-up evaluation** with fixed-point iteration.
 #### Algorithm Comparison
 | Aspect           | toy-datalog              | felix-db                   |
 |------------------|--------------------------|----------------------------|
 | Strategy         | Naive bottom-up          | Naive bottom-up            |
 | Termination      | Fixed-point check        | Fixed-point check          |
 | Variable binding | `Map VarId ConId`        | List indexed by position   |
 | Join operation   | `><` operator (set join) | Cartesian product + filter |
 | Index usage      | Yes (position-based)     | No                         |
 #### toy-datalog Evaluation
 ```haskell
 -- Key functions
 evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
 evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
 immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
 extendFixedpointDb :: Database -> Either EvalError Database
 ```
 Uses indices to find candidate tuples, then intersects results for bound positions.
 #### felix-db Evaluation
 ```haskell
 -- Key functions
 computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
 executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
 ```
 Generates all possible variable assignments from Herbrand universe, then filters.
 #### Variable Binding Approach
 ```haskell
 -- toy-datalog: Named bindings (Map)
 mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
 -- Checks consistency: same variable must bind to same value
 -- felix-db: Positional bindings (List + Index)
 data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
 -- Variables replaced with de Bruijn-style indices
 ```
 **toy-datalog:** More intuitive, variable names preserved throughout.
 **felix-db:** More efficient (integer indexing), but loses variable names.
 ---
 ### Feature Comparison
 | Feature                      | toy-datalog | felix-db         |
 |------------------------------|-------------|------------------|
 | Facts                        | ✅           | ✅                |
 | Rules                        | ✅           | ✅                |
 | Recursive rules              | ✅           | ✅                |
 | Multiple rules per predicate | ✅           | ✅                |
 | Arity checking               | ✅           | ✅                |
 | Fixed-point iteration        | ✅           | ✅                |
 | Position-based indexing      | ✅           | ❌                |
 | Interactive queries          | ❌           | ✅ (`?-` syntax)  |
 | Query projection             | ❌           | ✅ (`→ X, Y`)     |
 | Negation (parsing)           | ❌           | ✅                |
 | Negation (evaluation)        | ❌           | ❌                |
 | Numbers as distinct type     | ❌           | ✅                |
 | Quoted strings               | ❌           | ✅                |
 | Pretty printing              | ✅           | ❌                |
 | Database type class          | ❌           | ✅                |
 | Query engine type class      | ❌           | ✅                |
 | PostgreSQL integration       | ❌           | 🔄 (in progress) |
 ---
 ### Error Handling
 #### toy-datalog
 ```haskell
 data EvalError =
    EvalError_RuleWrongArity Rule WrongArity
  | EvalError_AtomWrongArity Atom WrongArity
 data WrongArity = WrongArity
  { _wrongArity_relation :: RelId
  , _wrongArity_expected :: Int
  , _wrongArity_got :: Int
  }
 ```
 Detailed error types with context.
 #### felix-db
 ```haskell
 data DatalogException
  = DuplicateRelationException Text
  | ArityMismatchException Text Int Int
  | RelationNotFoundException Text
  | VariableNotFoundException Text
 ```
 More error types but less context (just error message components).
 **Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
 ---
 ### Testing
 | Aspect           | toy-datalog       | felix-db        |
 |------------------|-------------------|-----------------|
 | Framework        | Tasty + HUnit     | Hspec           |
 | Parser tests     | ✅ Golden tests    | ✅ Unit tests    |
 | Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
 | Property tests   | ❌                 | ❌               |
 | Test files       | 2                 | 7               |
 #### felix-db Test Examples
 ```haskell
 -- Reflexive relation
 "equiv(X, X) :- ."
 -- Symmetric closure
 "equiv(Y, X) :- equiv(X, Y)."
 -- Transitive closure
 "equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
 -- Complex genealogical queries
 "niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
 ```
 **Winner: felix-db** - Much more comprehensive test coverage.
 ---
 ### Performance Characteristics
 #### Indexing
 | Operation                      | toy-datalog                     | felix-db       |
 |--------------------------------|---------------------------------|----------------|
 | Lookup atom with constants     | O(log n) via index intersection | O(m) full scan |
 | Lookup atom with all variables | O(m) full scan                  | O(m) full scan |
 | Insert tuple                   | O(k log n) (update k indices)   | O(log m)       |
 **toy-datalog advantage:** Indexed lookups for atoms with bound positions.
 #### Evaluation Complexity
 Both have the same theoretical complexity for naive evaluation:
 - **Time per iteration:** O(n^k × |rules| × r)
    - n = Herbrand universe size
    - k = max variables per rule
    - r = max body literals
 However, toy-datalog's indexing provides practical speedup when atoms have constants.
 #### Memory Usage
 | Aspect         | toy-datalog    | felix-db       |
 |----------------|----------------|----------------|
 | Tuple storage  | Set [ConId]    | Set [Constant] |
 | Index overhead | O(k × m) extra | None           |
 | Rule storage   | Separate map   | In relations   |
 toy-datalog uses more memory due to indices, but gains query performance.
 ---
 ### Code Quality
 #### toy-datalog
 **Strengths:**
 - Minimal, focused implementation
 - Good use of newtypes for type safety
 - Pretty printing support
 - Clean `HasConstants` typeclass
 **Weaknesses:**
 - No executable (commented out)
 - Evaluation tests disabled
 - No query interface
 #### felix-db
 **Strengths:**
 - Comprehensive test suite
 - Type class abstraction (`DatalogDB`, `QueryEngine`)
 - Query syntax with projection
 - More complete feature set
 - PostgreSQL integration in progress
 **Weaknesses:**
 - No position-based indexing
 - Raw `Text` instead of newtypes
 - No pretty printing
 - More complex codebase
 ---
 ### Use Case Recommendations
 #### Use toy-datalog when:
 - Learning Datalog evaluation fundamentals
 - Need indexed lookups for performance
 - Want minimal, readable implementation
 - Building something from scratch
 #### Use felix-db when:
 - Need query syntax (`?-`)
 - Want comprehensive tests as reference
 - Planning to extend with database backends
 - Need negation syntax (even if not evaluated)
 ---
 ### Summary Table
 | Category         | Winner      | Notes                             |
 |------------------|-------------|-----------------------------------|
 | Type Safety      | toy-datalog | Newtypes prevent errors           |
 | Features         | felix-db    | Queries, negation syntax, numbers |
 | Indexing         | toy-datalog | Position-based index              |
 | Testing          | felix-db    | 7 test files vs 1 active          |
 | Extensibility    | felix-db    | Type classes for DB/QE            |
 | Code Clarity     | tie         | Both well-organized               |
 | Error Messages   | toy-datalog | Full context in errors            |
 | Documentation    | toy-datalog | NOTES.md, CHANGELOG               |
 | Production Ready | neither     | Both need work                    |
 ---
 ### Potential Cross-Pollination
 #### toy-datalog could adopt from felix-db:
 1. Query syntax (`?-` with projection)
 2. Type class abstraction for database backend
 3. Comprehensive test suite
 4. Negation parsing (for future implementation)
 5. Separate `Num` type for integers
 #### felix-db could adopt from toy-datalog:
 1. Position-based indexing for performance
 2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
 3. Pretty printing (`Pretty` typeclass)
 4. Richer error types with context
 5. `HasConstants` typeclass pattern
 ---
 ### Conclusion
 Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
 They represent different design tradeoffs:
 - **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
 - **felix-db** prioritizes **features and extensibility** with a more complete implementation
 Neither supports negation evaluation, aggregation, or semi-naive evaluation.
 For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
 ## Changelog
 * **Mar 4, 2026** -- The first version was created.