diff --git a/hassan/007-cales-vs-felixs.md b/hassan/007-cales-vs-felixs.md new file mode 100644 index 0000000..a23d28d --- /dev/null +++ b/hassan/007-cales-vs-felixs.md @@ -0,0 +1,452 @@ +## Comparison: toy-datalog vs felix-db + +This document compares two Haskell implementations of Datalog interpreters. + +--- + +### Overview + +| Aspect | toy-datalog | felix-db | +|-------------------|-----------------------------|---------------------------| +| **Author** | Cale Gibbard | (felix-db team) | +| **Version** | 0.1.0.0 | 0.1.0.0 | +| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) | +| **License** | BSD-2-Clause | (not specified) | +| **Lines of Code** | ~300 | ~700 | +| **Primary Focus** | Minimal core implementation | More complete feature set | + +--- + +### Architecture Comparison + +#### toy-datalog + +``` +Program Text → Parser → AST → Database → Eval (fixed-point) → Results +``` + +Simple linear pipeline with 4 modules: + +- `Syntax.hs` - AST definitions +- `Parser.hs` - Megaparsec parser +- `Eval.hs` - Evaluation engine +- `IO.hs` - File utilities + +#### felix-db + +``` +Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results + ↓ + DigestedQuery +``` + +More modular architecture with 8+ modules: + +- `DatalogParser.hs` - Parser +- `DatalogDB.hs` - Database type class +- `InMemoryDB.hs` - Concrete implementation +- `Rules.hs` - Rule processing/digestion +- `NaiveQE.hs` - Query engine +- `DigestedQuery.hs` - Query preprocessing +- `QueryEngine.hs` - Query engine type class +- `Utility.hs` - Helper functions + +**Winner: felix-db** - Better separation of concerns and extensibility via type classes. + +--- + +### Syntax & AST + +#### Term Representation + +| Feature | toy-datalog | felix-db | +|-----------|---------------------------------|-------------------------------| +| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) | +| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) | +| Numbers | Parsed as constants | `Num Integer` (separate type) | +| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) | + +```haskell +-- toy-datalog +data Term = Con ConId | Var VarId + +-- felix-db +data Term = Var Text | Sym Text | Num Integer +``` + +**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time. + +**felix-db advantage:** Explicit `Num` type for numeric constants. + +#### Rule Representation + +```haskell +-- toy-datalog: Simple, direct +data Rule = Atom :- [Atom] + +-- felix-db: Separate head type, supports negation in body +data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal] +data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] } +``` + +**felix-db advantage:** Supports negation syntax (`positive :: Bool`). + +--- + +### Parser Comparison + +| Feature | toy-datalog | felix-db | +|----------------|---------------|-----------------------------| +| Library | Megaparsec | Megaparsec | +| Line comments | `--` | `--` | +| Block comments | `{- -}` | `/* */` | +| Rule arrow | `:-` only | `:-`, `→`, `->` | +| Negation | Not supported | `not` or `!` prefix | +| Queries | Not supported | `?-` syntax with projection | +| Quoted strings | No | Yes (`"alice"`) | +| Numbers | As constants | As separate `Num` type | + +#### Example Syntax + +```datalog +-- toy-datalog +parent(xerces, brooke). +ancestor(X, Y) :- parent(X, Y). + +-- felix-db (additional features) +parent("alice", "bob"). +ancestor(X, Y) :- parent(X, Y), not sibling(X, Y). +?- ancestor(A, B), ancestor(B, C) → A, C. +``` + +**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings. + +--- + +### Data Structures + +#### Relation Storage + +```haskell +-- toy-datalog: Indexed storage +data Relation = Relation + { _relation_arity :: Int + , _relation_members :: Set [ConId] + , _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index + } + +-- felix-db: Simple storage + rules +data Relation = Relation + { name :: RelationId + , arity :: Int + , tuples :: Set [Constant] -- No index + , rules :: [RelationRule] -- Rules stored with relation + } +``` + +**toy-datalog advantage:** Position-based indexing for efficient lookups. + +**felix-db advantage:** Rules stored directly with relations (cleaner organization). + +#### Database Structure + +```haskell +-- toy-datalog +data Database = Database + { _database_universe :: Set ConId + , _database_relations :: Map RelId Relation + , _database_rules :: Map RelId [Rule] -- Rules separate from relations + } + +-- felix-db +data InMemoryDB = InMemoryDB + { relations :: Map RelationId Relation -- Rules inside Relation + , constants :: Set Constant + } +``` + +Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations. + +--- + +### Evaluation Strategy + +Both use **naive bottom-up evaluation** with fixed-point iteration. + +#### Algorithm Comparison + +| Aspect | toy-datalog | felix-db | +|------------------|--------------------------|----------------------------| +| Strategy | Naive bottom-up | Naive bottom-up | +| Termination | Fixed-point check | Fixed-point check | +| Variable binding | `Map VarId ConId` | List indexed by position | +| Join operation | `><` operator (set join) | Cartesian product + filter | +| Index usage | Yes (position-based) | No | + +#### toy-datalog Evaluation + +```haskell +-- Key functions +evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId)) +evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId)) +immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId]) +extendFixedpointDb :: Database -> Either EvalError Database +``` + +Uses indices to find candidate tuples, then intersects results for bound positions. + +#### felix-db Evaluation + +```haskell +-- Key functions +computeHerbrand :: (DatalogDB db) => db -> Map Text Relation +executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]] +``` + +Generates all possible variable assignments from Herbrand universe, then filters. + +#### Variable Binding Approach + +```haskell +-- toy-datalog: Named bindings (Map) +mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId) +-- Checks consistency: same variable must bind to same value + +-- felix-db: Positional bindings (List + Index) +data RuleElement = RuleElementConstant Constant | RuleElementVariable Int +-- Variables replaced with de Bruijn-style indices +``` + +**toy-datalog:** More intuitive, variable names preserved throughout. + +**felix-db:** More efficient (integer indexing), but loses variable names. + +--- + +### Feature Comparison + +| Feature | toy-datalog | felix-db | +|------------------------------|-------------|------------------| +| Facts | ✅ | ✅ | +| Rules | ✅ | ✅ | +| Recursive rules | ✅ | ✅ | +| Multiple rules per predicate | ✅ | ✅ | +| Arity checking | ✅ | ✅ | +| Fixed-point iteration | ✅ | ✅ | +| Position-based indexing | ✅ | ❌ | +| Interactive queries | ❌ | ✅ (`?-` syntax) | +| Query projection | ❌ | ✅ (`→ X, Y`) | +| Negation (parsing) | ❌ | ✅ | +| Negation (evaluation) | ❌ | ❌ | +| Numbers as distinct type | ❌ | ✅ | +| Quoted strings | ❌ | ✅ | +| Pretty printing | ✅ | ❌ | +| Database type class | ❌ | ✅ | +| Query engine type class | ❌ | ✅ | +| PostgreSQL integration | ❌ | 🔄 (in progress) | + +--- + +### Error Handling + +#### toy-datalog + +```haskell +data EvalError = + EvalError_RuleWrongArity Rule WrongArity + | EvalError_AtomWrongArity Atom WrongArity + +data WrongArity = WrongArity + { _wrongArity_relation :: RelId + , _wrongArity_expected :: Int + , _wrongArity_got :: Int + } +``` + +Detailed error types with context. + +#### felix-db + +```haskell +data DatalogException + = DuplicateRelationException Text + | ArityMismatchException Text Int Int + | RelationNotFoundException Text + | VariableNotFoundException Text +``` + +More error types but less context (just error message components). + +**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem. + +--- + +### Testing + +| Aspect | toy-datalog | felix-db | +|------------------|-------------------|-----------------| +| Framework | Tasty + HUnit | Hspec | +| Parser tests | ✅ Golden tests | ✅ Unit tests | +| Evaluation tests | ❌ (commented out) | ✅ Comprehensive | +| Property tests | ❌ | ❌ | +| Test files | 2 | 7 | + +#### felix-db Test Examples + +```haskell +-- Reflexive relation +"equiv(X, X) :- ." + +-- Symmetric closure +"equiv(Y, X) :- equiv(X, Y)." + +-- Transitive closure +"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)." + +-- Complex genealogical queries +"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)." +``` + +**Winner: felix-db** - Much more comprehensive test coverage. + +--- + +### Performance Characteristics + +#### Indexing + +| Operation | toy-datalog | felix-db | +|--------------------------------|---------------------------------|----------------| +| Lookup atom with constants | O(log n) via index intersection | O(m) full scan | +| Lookup atom with all variables | O(m) full scan | O(m) full scan | +| Insert tuple | O(k log n) (update k indices) | O(log m) | + +**toy-datalog advantage:** Indexed lookups for atoms with bound positions. + +#### Evaluation Complexity + +Both have the same theoretical complexity for naive evaluation: + +- **Time per iteration:** O(n^k × |rules| × r) + - n = Herbrand universe size + - k = max variables per rule + - r = max body literals + +However, toy-datalog's indexing provides practical speedup when atoms have constants. + +#### Memory Usage + +| Aspect | toy-datalog | felix-db | +|----------------|----------------|----------------| +| Tuple storage | Set [ConId] | Set [Constant] | +| Index overhead | O(k × m) extra | None | +| Rule storage | Separate map | In relations | + +toy-datalog uses more memory due to indices, but gains query performance. + +--- + +### Code Quality + +#### toy-datalog + +**Strengths:** + +- Minimal, focused implementation +- Good use of newtypes for type safety +- Pretty printing support +- Clean `HasConstants` typeclass + +**Weaknesses:** + +- No executable (commented out) +- Evaluation tests disabled +- No query interface + +#### felix-db + +**Strengths:** + +- Comprehensive test suite +- Type class abstraction (`DatalogDB`, `QueryEngine`) +- Query syntax with projection +- More complete feature set +- PostgreSQL integration in progress + +**Weaknesses:** + +- No position-based indexing +- Raw `Text` instead of newtypes +- No pretty printing +- More complex codebase + +--- + +### Use Case Recommendations + +#### Use toy-datalog when: + +- Learning Datalog evaluation fundamentals +- Need indexed lookups for performance +- Want minimal, readable implementation +- Building something from scratch + +#### Use felix-db when: + +- Need query syntax (`?-`) +- Want comprehensive tests as reference +- Planning to extend with database backends +- Need negation syntax (even if not evaluated) + +--- + +### Summary Table + +| Category | Winner | Notes | +|------------------|-------------|-----------------------------------| +| Type Safety | toy-datalog | Newtypes prevent errors | +| Features | felix-db | Queries, negation syntax, numbers | +| Indexing | toy-datalog | Position-based index | +| Testing | felix-db | 7 test files vs 1 active | +| Extensibility | felix-db | Type classes for DB/QE | +| Code Clarity | tie | Both well-organized | +| Error Messages | toy-datalog | Full context in errors | +| Documentation | toy-datalog | NOTES.md, CHANGELOG | +| Production Ready | neither | Both need work | + +--- + +### Potential Cross-Pollination + +#### toy-datalog could adopt from felix-db: + +1. Query syntax (`?-` with projection) +2. Type class abstraction for database backend +3. Comprehensive test suite +4. Negation parsing (for future implementation) +5. Separate `Num` type for integers + +#### felix-db could adopt from toy-datalog: + +1. Position-based indexing for performance +2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety +3. Pretty printing (`Pretty` typeclass) +4. Richer error types with context +5. `HasConstants` typeclass pattern + +--- + +### Conclusion + +Both implementations correctly handle core positive Datalog with naive bottom-up evaluation. +They represent different design tradeoffs: + +- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase +- **felix-db** prioritizes **features and extensibility** with a more complete implementation + +Neither supports negation evaluation, aggregation, or semi-naive evaluation. +For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal. + +## Changelog + +* **Mar 4, 2026** -- The first version was created.