useful-notes/hassan/007-cales-vs-felixs.md
2026-03-04 13:25:42 +01:00

453 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Comparison: toy-datalog vs felix-db
This document compares two Haskell implementations of Datalog interpreters.
---
### Overview
| Aspect | toy-datalog | felix-db |
|-------------------|-----------------------------|---------------------------|
| **Author** | Cale Gibbard | (felix-db team) |
| **Version** | 0.1.0.0 | 0.1.0.0 |
| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) |
| **License** | BSD-2-Clause | (not specified) |
| **Lines of Code** | ~300 | ~700 |
| **Primary Focus** | Minimal core implementation | More complete feature set |
---
### Architecture Comparison
#### toy-datalog
```
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
```
Simple linear pipeline with 4 modules:
- `Syntax.hs` - AST definitions
- `Parser.hs` - Megaparsec parser
- `Eval.hs` - Evaluation engine
- `IO.hs` - File utilities
#### felix-db
```
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
DigestedQuery
```
More modular architecture with 8+ modules:
- `DatalogParser.hs` - Parser
- `DatalogDB.hs` - Database type class
- `InMemoryDB.hs` - Concrete implementation
- `Rules.hs` - Rule processing/digestion
- `NaiveQE.hs` - Query engine
- `DigestedQuery.hs` - Query preprocessing
- `QueryEngine.hs` - Query engine type class
- `Utility.hs` - Helper functions
**Winner: felix-db** - Better separation of concerns and extensibility via type classes.
---
### Syntax & AST
#### Term Representation
| Feature | toy-datalog | felix-db |
|-----------|---------------------------------|-------------------------------|
| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) |
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
| Numbers | Parsed as constants | `Num Integer` (separate type) |
| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) |
```haskell
-- toy-datalog
data Term = Con ConId | Var VarId
-- felix-db
data Term = Var Text | Sym Text | Num Integer
```
**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
**felix-db advantage:** Explicit `Num` type for numeric constants.
#### Rule Representation
```haskell
-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]
-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
```
**felix-db advantage:** Supports negation syntax (`positive :: Bool`).
---
### Parser Comparison
| Feature | toy-datalog | felix-db |
|----------------|---------------|-----------------------------|
| Library | Megaparsec | Megaparsec |
| Line comments | `--` | `--` |
| Block comments | `{- -}` | `/* */` |
| Rule arrow | `:-` only | `:-`, `→`, `->` |
| Negation | Not supported | `not` or `!` prefix |
| Queries | Not supported | `?-` syntax with projection |
| Quoted strings | No | Yes (`"alice"`) |
| Numbers | As constants | As separate `Num` type |
#### Example Syntax
```datalog
-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).
-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.
```
**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
---
### Data Structures
#### Relation Storage
```haskell
-- toy-datalog: Indexed storage
data Relation = Relation
{ _relation_arity :: Int
, _relation_members :: Set [ConId]
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
}
-- felix-db: Simple storage + rules
data Relation = Relation
{ name :: RelationId
, arity :: Int
, tuples :: Set [Constant] -- No index
, rules :: [RelationRule] -- Rules stored with relation
}
```
**toy-datalog advantage:** Position-based indexing for efficient lookups.
**felix-db advantage:** Rules stored directly with relations (cleaner organization).
#### Database Structure
```haskell
-- toy-datalog
data Database = Database
{ _database_universe :: Set ConId
, _database_relations :: Map RelId Relation
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
}
-- felix-db
data InMemoryDB = InMemoryDB
{ relations :: Map RelationId Relation -- Rules inside Relation
, constants :: Set Constant
}
```
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
---
### Evaluation Strategy
Both use **naive bottom-up evaluation** with fixed-point iteration.
#### Algorithm Comparison
| Aspect | toy-datalog | felix-db |
|------------------|--------------------------|----------------------------|
| Strategy | Naive bottom-up | Naive bottom-up |
| Termination | Fixed-point check | Fixed-point check |
| Variable binding | `Map VarId ConId` | List indexed by position |
| Join operation | `><` operator (set join) | Cartesian product + filter |
| Index usage | Yes (position-based) | No |
#### toy-datalog Evaluation
```haskell
-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database
```
Uses indices to find candidate tuples, then intersects results for bound positions.
#### felix-db Evaluation
```haskell
-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
```
Generates all possible variable assignments from Herbrand universe, then filters.
#### Variable Binding Approach
```haskell
-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value
-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices
```
**toy-datalog:** More intuitive, variable names preserved throughout.
**felix-db:** More efficient (integer indexing), but loses variable names.
---
### Feature Comparison
| Feature | toy-datalog | felix-db |
|------------------------------|-------------|------------------|
| Facts | ✅ | ✅ |
| Rules | ✅ | ✅ |
| Recursive rules | ✅ | ✅ |
| Multiple rules per predicate | ✅ | ✅ |
| Arity checking | ✅ | ✅ |
| Fixed-point iteration | ✅ | ✅ |
| Position-based indexing | ✅ | ❌ |
| Interactive queries | ❌ | ✅ (`?-` syntax) |
| Query projection | ❌ | ✅ (`→ X, Y`) |
| Negation (parsing) | ❌ | ✅ |
| Negation (evaluation) | ❌ | ❌ |
| Numbers as distinct type | ❌ | ✅ |
| Quoted strings | ❌ | ✅ |
| Pretty printing | ✅ | ❌ |
| Database type class | ❌ | ✅ |
| Query engine type class | ❌ | ✅ |
| PostgreSQL integration | ❌ | 🔄 (in progress) |
---
### Error Handling
#### toy-datalog
```haskell
data EvalError =
EvalError_RuleWrongArity Rule WrongArity
| EvalError_AtomWrongArity Atom WrongArity
data WrongArity = WrongArity
{ _wrongArity_relation :: RelId
, _wrongArity_expected :: Int
, _wrongArity_got :: Int
}
```
Detailed error types with context.
#### felix-db
```haskell
data DatalogException
= DuplicateRelationException Text
| ArityMismatchException Text Int Int
| RelationNotFoundException Text
| VariableNotFoundException Text
```
More error types but less context (just error message components).
**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
---
### Testing
| Aspect | toy-datalog | felix-db |
|------------------|-------------------|-----------------|
| Framework | Tasty + HUnit | Hspec |
| Parser tests | ✅ Golden tests | ✅ Unit tests |
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
| Property tests | ❌ | ❌ |
| Test files | 2 | 7 |
#### felix-db Test Examples
```haskell
-- Reflexive relation
"equiv(X, X) :- ."
-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."
-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
```
**Winner: felix-db** - Much more comprehensive test coverage.
---
### Performance Characteristics
#### Indexing
| Operation | toy-datalog | felix-db |
|--------------------------------|---------------------------------|----------------|
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
| Insert tuple | O(k log n) (update k indices) | O(log m) |
**toy-datalog advantage:** Indexed lookups for atoms with bound positions.
#### Evaluation Complexity
Both have the same theoretical complexity for naive evaluation:
- **Time per iteration:** O(n^k × |rules| × r)
- n = Herbrand universe size
- k = max variables per rule
- r = max body literals
However, toy-datalog's indexing provides practical speedup when atoms have constants.
#### Memory Usage
| Aspect | toy-datalog | felix-db |
|----------------|----------------|----------------|
| Tuple storage | Set [ConId] | Set [Constant] |
| Index overhead | O(k × m) extra | None |
| Rule storage | Separate map | In relations |
toy-datalog uses more memory due to indices, but gains query performance.
---
### Code Quality
#### toy-datalog
**Strengths:**
- Minimal, focused implementation
- Good use of newtypes for type safety
- Pretty printing support
- Clean `HasConstants` typeclass
**Weaknesses:**
- No executable (commented out)
- Evaluation tests disabled
- No query interface
#### felix-db
**Strengths:**
- Comprehensive test suite
- Type class abstraction (`DatalogDB`, `QueryEngine`)
- Query syntax with projection
- More complete feature set
- PostgreSQL integration in progress
**Weaknesses:**
- No position-based indexing
- Raw `Text` instead of newtypes
- No pretty printing
- More complex codebase
---
### Use Case Recommendations
#### Use toy-datalog when:
- Learning Datalog evaluation fundamentals
- Need indexed lookups for performance
- Want minimal, readable implementation
- Building something from scratch
#### Use felix-db when:
- Need query syntax (`?-`)
- Want comprehensive tests as reference
- Planning to extend with database backends
- Need negation syntax (even if not evaluated)
---
### Summary Table
| Category | Winner | Notes |
|------------------|-------------|-----------------------------------|
| Type Safety | toy-datalog | Newtypes prevent errors |
| Features | felix-db | Queries, negation syntax, numbers |
| Indexing | toy-datalog | Position-based index |
| Testing | felix-db | 7 test files vs 1 active |
| Extensibility | felix-db | Type classes for DB/QE |
| Code Clarity | tie | Both well-organized |
| Error Messages | toy-datalog | Full context in errors |
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
| Production Ready | neither | Both need work |
---
### Potential Cross-Pollination
#### toy-datalog could adopt from felix-db:
1. Query syntax (`?-` with projection)
2. Type class abstraction for database backend
3. Comprehensive test suite
4. Negation parsing (for future implementation)
5. Separate `Num` type for integers
#### felix-db could adopt from toy-datalog:
1. Position-based indexing for performance
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
3. Pretty printing (`Pretty` typeclass)
4. Richer error types with context
5. `HasConstants` typeclass pattern
---
### Conclusion
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
They represent different design tradeoffs:
- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
- **felix-db** prioritizes **features and extensibility** with a more complete implementation
Neither supports negation evaluation, aggregation, or semi-naive evaluation.
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
## Changelog
* **Mar 4, 2026** -- The first version was created.