Add comparison of toy-datalog vs felix-db

This commit is contained in:
Hassan Abedi 2026-03-04 13:25:42 +01:00
parent 929c71488a
commit 4651b09e28

View File

@ -0,0 +1,452 @@
## Comparison: toy-datalog vs felix-db
This document compares two Haskell implementations of Datalog interpreters.
---
### Overview
| Aspect | toy-datalog | felix-db |
|-------------------|-----------------------------|---------------------------|
| **Author** | Cale Gibbard | (felix-db team) |
| **Version** | 0.1.0.0 | 0.1.0.0 |
| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) |
| **License** | BSD-2-Clause | (not specified) |
| **Lines of Code** | ~300 | ~700 |
| **Primary Focus** | Minimal core implementation | More complete feature set |
---
### Architecture Comparison
#### toy-datalog
```
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
```
Simple linear pipeline with 4 modules:
- `Syntax.hs` - AST definitions
- `Parser.hs` - Megaparsec parser
- `Eval.hs` - Evaluation engine
- `IO.hs` - File utilities
#### felix-db
```
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
DigestedQuery
```
More modular architecture with 8+ modules:
- `DatalogParser.hs` - Parser
- `DatalogDB.hs` - Database type class
- `InMemoryDB.hs` - Concrete implementation
- `Rules.hs` - Rule processing/digestion
- `NaiveQE.hs` - Query engine
- `DigestedQuery.hs` - Query preprocessing
- `QueryEngine.hs` - Query engine type class
- `Utility.hs` - Helper functions
**Winner: felix-db** - Better separation of concerns and extensibility via type classes.
---
### Syntax & AST
#### Term Representation
| Feature | toy-datalog | felix-db |
|-----------|---------------------------------|-------------------------------|
| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) |
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
| Numbers | Parsed as constants | `Num Integer` (separate type) |
| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) |
```haskell
-- toy-datalog
data Term = Con ConId | Var VarId
-- felix-db
data Term = Var Text | Sym Text | Num Integer
```
**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
**felix-db advantage:** Explicit `Num` type for numeric constants.
#### Rule Representation
```haskell
-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]
-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
```
**felix-db advantage:** Supports negation syntax (`positive :: Bool`).
---
### Parser Comparison
| Feature | toy-datalog | felix-db |
|----------------|---------------|-----------------------------|
| Library | Megaparsec | Megaparsec |
| Line comments | `--` | `--` |
| Block comments | `{- -}` | `/* */` |
| Rule arrow | `:-` only | `:-`, `→`, `->` |
| Negation | Not supported | `not` or `!` prefix |
| Queries | Not supported | `?-` syntax with projection |
| Quoted strings | No | Yes (`"alice"`) |
| Numbers | As constants | As separate `Num` type |
#### Example Syntax
```datalog
-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).
-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.
```
**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
---
### Data Structures
#### Relation Storage
```haskell
-- toy-datalog: Indexed storage
data Relation = Relation
{ _relation_arity :: Int
, _relation_members :: Set [ConId]
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
}
-- felix-db: Simple storage + rules
data Relation = Relation
{ name :: RelationId
, arity :: Int
, tuples :: Set [Constant] -- No index
, rules :: [RelationRule] -- Rules stored with relation
}
```
**toy-datalog advantage:** Position-based indexing for efficient lookups.
**felix-db advantage:** Rules stored directly with relations (cleaner organization).
#### Database Structure
```haskell
-- toy-datalog
data Database = Database
{ _database_universe :: Set ConId
, _database_relations :: Map RelId Relation
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
}
-- felix-db
data InMemoryDB = InMemoryDB
{ relations :: Map RelationId Relation -- Rules inside Relation
, constants :: Set Constant
}
```
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
---
### Evaluation Strategy
Both use **naive bottom-up evaluation** with fixed-point iteration.
#### Algorithm Comparison
| Aspect | toy-datalog | felix-db |
|------------------|--------------------------|----------------------------|
| Strategy | Naive bottom-up | Naive bottom-up |
| Termination | Fixed-point check | Fixed-point check |
| Variable binding | `Map VarId ConId` | List indexed by position |
| Join operation | `><` operator (set join) | Cartesian product + filter |
| Index usage | Yes (position-based) | No |
#### toy-datalog Evaluation
```haskell
-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database
```
Uses indices to find candidate tuples, then intersects results for bound positions.
#### felix-db Evaluation
```haskell
-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
```
Generates all possible variable assignments from Herbrand universe, then filters.
#### Variable Binding Approach
```haskell
-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value
-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices
```
**toy-datalog:** More intuitive, variable names preserved throughout.
**felix-db:** More efficient (integer indexing), but loses variable names.
---
### Feature Comparison
| Feature | toy-datalog | felix-db |
|------------------------------|-------------|------------------|
| Facts | ✅ | ✅ |
| Rules | ✅ | ✅ |
| Recursive rules | ✅ | ✅ |
| Multiple rules per predicate | ✅ | ✅ |
| Arity checking | ✅ | ✅ |
| Fixed-point iteration | ✅ | ✅ |
| Position-based indexing | ✅ | ❌ |
| Interactive queries | ❌ | ✅ (`?-` syntax) |
| Query projection | ❌ | ✅ (`→ X, Y`) |
| Negation (parsing) | ❌ | ✅ |
| Negation (evaluation) | ❌ | ❌ |
| Numbers as distinct type | ❌ | ✅ |
| Quoted strings | ❌ | ✅ |
| Pretty printing | ✅ | ❌ |
| Database type class | ❌ | ✅ |
| Query engine type class | ❌ | ✅ |
| PostgreSQL integration | ❌ | 🔄 (in progress) |
---
### Error Handling
#### toy-datalog
```haskell
data EvalError =
EvalError_RuleWrongArity Rule WrongArity
| EvalError_AtomWrongArity Atom WrongArity
data WrongArity = WrongArity
{ _wrongArity_relation :: RelId
, _wrongArity_expected :: Int
, _wrongArity_got :: Int
}
```
Detailed error types with context.
#### felix-db
```haskell
data DatalogException
= DuplicateRelationException Text
| ArityMismatchException Text Int Int
| RelationNotFoundException Text
| VariableNotFoundException Text
```
More error types but less context (just error message components).
**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
---
### Testing
| Aspect | toy-datalog | felix-db |
|------------------|-------------------|-----------------|
| Framework | Tasty + HUnit | Hspec |
| Parser tests | ✅ Golden tests | ✅ Unit tests |
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
| Property tests | ❌ | ❌ |
| Test files | 2 | 7 |
#### felix-db Test Examples
```haskell
-- Reflexive relation
"equiv(X, X) :- ."
-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."
-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
```
**Winner: felix-db** - Much more comprehensive test coverage.
---
### Performance Characteristics
#### Indexing
| Operation | toy-datalog | felix-db |
|--------------------------------|---------------------------------|----------------|
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
| Insert tuple | O(k log n) (update k indices) | O(log m) |
**toy-datalog advantage:** Indexed lookups for atoms with bound positions.
#### Evaluation Complexity
Both have the same theoretical complexity for naive evaluation:
- **Time per iteration:** O(n^k × |rules| × r)
- n = Herbrand universe size
- k = max variables per rule
- r = max body literals
However, toy-datalog's indexing provides practical speedup when atoms have constants.
#### Memory Usage
| Aspect | toy-datalog | felix-db |
|----------------|----------------|----------------|
| Tuple storage | Set [ConId] | Set [Constant] |
| Index overhead | O(k × m) extra | None |
| Rule storage | Separate map | In relations |
toy-datalog uses more memory due to indices, but gains query performance.
---
### Code Quality
#### toy-datalog
**Strengths:**
- Minimal, focused implementation
- Good use of newtypes for type safety
- Pretty printing support
- Clean `HasConstants` typeclass
**Weaknesses:**
- No executable (commented out)
- Evaluation tests disabled
- No query interface
#### felix-db
**Strengths:**
- Comprehensive test suite
- Type class abstraction (`DatalogDB`, `QueryEngine`)
- Query syntax with projection
- More complete feature set
- PostgreSQL integration in progress
**Weaknesses:**
- No position-based indexing
- Raw `Text` instead of newtypes
- No pretty printing
- More complex codebase
---
### Use Case Recommendations
#### Use toy-datalog when:
- Learning Datalog evaluation fundamentals
- Need indexed lookups for performance
- Want minimal, readable implementation
- Building something from scratch
#### Use felix-db when:
- Need query syntax (`?-`)
- Want comprehensive tests as reference
- Planning to extend with database backends
- Need negation syntax (even if not evaluated)
---
### Summary Table
| Category | Winner | Notes |
|------------------|-------------|-----------------------------------|
| Type Safety | toy-datalog | Newtypes prevent errors |
| Features | felix-db | Queries, negation syntax, numbers |
| Indexing | toy-datalog | Position-based index |
| Testing | felix-db | 7 test files vs 1 active |
| Extensibility | felix-db | Type classes for DB/QE |
| Code Clarity | tie | Both well-organized |
| Error Messages | toy-datalog | Full context in errors |
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
| Production Ready | neither | Both need work |
---
### Potential Cross-Pollination
#### toy-datalog could adopt from felix-db:
1. Query syntax (`?-` with projection)
2. Type class abstraction for database backend
3. Comprehensive test suite
4. Negation parsing (for future implementation)
5. Separate `Num` type for integers
#### felix-db could adopt from toy-datalog:
1. Position-based indexing for performance
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
3. Pretty printing (`Pretty` typeclass)
4. Richer error types with context
5. `HasConstants` typeclass pattern
---
### Conclusion
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
They represent different design tradeoffs:
- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
- **felix-db** prioritizes **features and extensibility** with a more complete implementation
Neither supports negation evaluation, aggregation, or semi-naive evaluation.
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
## Changelog
* **Mar 4, 2026** -- The first version was created.