Add comparison of toy-datalog vs felix-db
This commit is contained in:
parent
929c71488a
commit
4651b09e28
452
hassan/007-cales-vs-felixs.md
Normal file
452
hassan/007-cales-vs-felixs.md
Normal file
@ -0,0 +1,452 @@
|
||||
## Comparison: toy-datalog vs felix-db
|
||||
|
||||
This document compares two Haskell implementations of Datalog interpreters.
|
||||
|
||||
---
|
||||
|
||||
### Overview
|
||||
|
||||
| Aspect | toy-datalog | felix-db |
|
||||
|-------------------|-----------------------------|---------------------------|
|
||||
| **Author** | Cale Gibbard | (felix-db team) |
|
||||
| **Version** | 0.1.0.0 | 0.1.0.0 |
|
||||
| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) |
|
||||
| **License** | BSD-2-Clause | (not specified) |
|
||||
| **Lines of Code** | ~300 | ~700 |
|
||||
| **Primary Focus** | Minimal core implementation | More complete feature set |
|
||||
|
||||
---
|
||||
|
||||
### Architecture Comparison
|
||||
|
||||
#### toy-datalog
|
||||
|
||||
```
|
||||
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
|
||||
```
|
||||
|
||||
Simple linear pipeline with 4 modules:
|
||||
|
||||
- `Syntax.hs` - AST definitions
|
||||
- `Parser.hs` - Megaparsec parser
|
||||
- `Eval.hs` - Evaluation engine
|
||||
- `IO.hs` - File utilities
|
||||
|
||||
#### felix-db
|
||||
|
||||
```
|
||||
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
|
||||
↓
|
||||
DigestedQuery
|
||||
```
|
||||
|
||||
More modular architecture with 8+ modules:
|
||||
|
||||
- `DatalogParser.hs` - Parser
|
||||
- `DatalogDB.hs` - Database type class
|
||||
- `InMemoryDB.hs` - Concrete implementation
|
||||
- `Rules.hs` - Rule processing/digestion
|
||||
- `NaiveQE.hs` - Query engine
|
||||
- `DigestedQuery.hs` - Query preprocessing
|
||||
- `QueryEngine.hs` - Query engine type class
|
||||
- `Utility.hs` - Helper functions
|
||||
|
||||
**Winner: felix-db** - Better separation of concerns and extensibility via type classes.
|
||||
|
||||
---
|
||||
|
||||
### Syntax & AST
|
||||
|
||||
#### Term Representation
|
||||
|
||||
| Feature | toy-datalog | felix-db |
|
||||
|-----------|---------------------------------|-------------------------------|
|
||||
| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) |
|
||||
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
|
||||
| Numbers | Parsed as constants | `Num Integer` (separate type) |
|
||||
| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) |
|
||||
|
||||
```haskell
|
||||
-- toy-datalog
|
||||
data Term = Con ConId | Var VarId
|
||||
|
||||
-- felix-db
|
||||
data Term = Var Text | Sym Text | Num Integer
|
||||
```
|
||||
|
||||
**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
|
||||
|
||||
**felix-db advantage:** Explicit `Num` type for numeric constants.
|
||||
|
||||
#### Rule Representation
|
||||
|
||||
```haskell
|
||||
-- toy-datalog: Simple, direct
|
||||
data Rule = Atom :- [Atom]
|
||||
|
||||
-- felix-db: Separate head type, supports negation in body
|
||||
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
|
||||
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
|
||||
```
|
||||
|
||||
**felix-db advantage:** Supports negation syntax (`positive :: Bool`).
|
||||
|
||||
---
|
||||
|
||||
### Parser Comparison
|
||||
|
||||
| Feature | toy-datalog | felix-db |
|
||||
|----------------|---------------|-----------------------------|
|
||||
| Library | Megaparsec | Megaparsec |
|
||||
| Line comments | `--` | `--` |
|
||||
| Block comments | `{- -}` | `/* */` |
|
||||
| Rule arrow | `:-` only | `:-`, `→`, `->` |
|
||||
| Negation | Not supported | `not` or `!` prefix |
|
||||
| Queries | Not supported | `?-` syntax with projection |
|
||||
| Quoted strings | No | Yes (`"alice"`) |
|
||||
| Numbers | As constants | As separate `Num` type |
|
||||
|
||||
#### Example Syntax
|
||||
|
||||
```datalog
|
||||
-- toy-datalog
|
||||
parent(xerces, brooke).
|
||||
ancestor(X, Y) :- parent(X, Y).
|
||||
|
||||
-- felix-db (additional features)
|
||||
parent("alice", "bob").
|
||||
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
|
||||
?- ancestor(A, B), ancestor(B, C) → A, C.
|
||||
```
|
||||
|
||||
**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
|
||||
|
||||
---
|
||||
|
||||
### Data Structures
|
||||
|
||||
#### Relation Storage
|
||||
|
||||
```haskell
|
||||
-- toy-datalog: Indexed storage
|
||||
data Relation = Relation
|
||||
{ _relation_arity :: Int
|
||||
, _relation_members :: Set [ConId]
|
||||
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
|
||||
}
|
||||
|
||||
-- felix-db: Simple storage + rules
|
||||
data Relation = Relation
|
||||
{ name :: RelationId
|
||||
, arity :: Int
|
||||
, tuples :: Set [Constant] -- No index
|
||||
, rules :: [RelationRule] -- Rules stored with relation
|
||||
}
|
||||
```
|
||||
|
||||
**toy-datalog advantage:** Position-based indexing for efficient lookups.
|
||||
|
||||
**felix-db advantage:** Rules stored directly with relations (cleaner organization).
|
||||
|
||||
#### Database Structure
|
||||
|
||||
```haskell
|
||||
-- toy-datalog
|
||||
data Database = Database
|
||||
{ _database_universe :: Set ConId
|
||||
, _database_relations :: Map RelId Relation
|
||||
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
|
||||
}
|
||||
|
||||
-- felix-db
|
||||
data InMemoryDB = InMemoryDB
|
||||
{ relations :: Map RelationId Relation -- Rules inside Relation
|
||||
, constants :: Set Constant
|
||||
}
|
||||
```
|
||||
|
||||
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
|
||||
|
||||
---
|
||||
|
||||
### Evaluation Strategy
|
||||
|
||||
Both use **naive bottom-up evaluation** with fixed-point iteration.
|
||||
|
||||
#### Algorithm Comparison
|
||||
|
||||
| Aspect | toy-datalog | felix-db |
|
||||
|------------------|--------------------------|----------------------------|
|
||||
| Strategy | Naive bottom-up | Naive bottom-up |
|
||||
| Termination | Fixed-point check | Fixed-point check |
|
||||
| Variable binding | `Map VarId ConId` | List indexed by position |
|
||||
| Join operation | `><` operator (set join) | Cartesian product + filter |
|
||||
| Index usage | Yes (position-based) | No |
|
||||
|
||||
#### toy-datalog Evaluation
|
||||
|
||||
```haskell
|
||||
-- Key functions
|
||||
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
|
||||
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
|
||||
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
|
||||
extendFixedpointDb :: Database -> Either EvalError Database
|
||||
```
|
||||
|
||||
Uses indices to find candidate tuples, then intersects results for bound positions.
|
||||
|
||||
#### felix-db Evaluation
|
||||
|
||||
```haskell
|
||||
-- Key functions
|
||||
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
|
||||
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
|
||||
```
|
||||
|
||||
Generates all possible variable assignments from Herbrand universe, then filters.
|
||||
|
||||
#### Variable Binding Approach
|
||||
|
||||
```haskell
|
||||
-- toy-datalog: Named bindings (Map)
|
||||
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
|
||||
-- Checks consistency: same variable must bind to same value
|
||||
|
||||
-- felix-db: Positional bindings (List + Index)
|
||||
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
|
||||
-- Variables replaced with de Bruijn-style indices
|
||||
```
|
||||
|
||||
**toy-datalog:** More intuitive, variable names preserved throughout.
|
||||
|
||||
**felix-db:** More efficient (integer indexing), but loses variable names.
|
||||
|
||||
---
|
||||
|
||||
### Feature Comparison
|
||||
|
||||
| Feature | toy-datalog | felix-db |
|
||||
|------------------------------|-------------|------------------|
|
||||
| Facts | ✅ | ✅ |
|
||||
| Rules | ✅ | ✅ |
|
||||
| Recursive rules | ✅ | ✅ |
|
||||
| Multiple rules per predicate | ✅ | ✅ |
|
||||
| Arity checking | ✅ | ✅ |
|
||||
| Fixed-point iteration | ✅ | ✅ |
|
||||
| Position-based indexing | ✅ | ❌ |
|
||||
| Interactive queries | ❌ | ✅ (`?-` syntax) |
|
||||
| Query projection | ❌ | ✅ (`→ X, Y`) |
|
||||
| Negation (parsing) | ❌ | ✅ |
|
||||
| Negation (evaluation) | ❌ | ❌ |
|
||||
| Numbers as distinct type | ❌ | ✅ |
|
||||
| Quoted strings | ❌ | ✅ |
|
||||
| Pretty printing | ✅ | ❌ |
|
||||
| Database type class | ❌ | ✅ |
|
||||
| Query engine type class | ❌ | ✅ |
|
||||
| PostgreSQL integration | ❌ | 🔄 (in progress) |
|
||||
|
||||
---
|
||||
|
||||
### Error Handling
|
||||
|
||||
#### toy-datalog
|
||||
|
||||
```haskell
|
||||
data EvalError =
|
||||
EvalError_RuleWrongArity Rule WrongArity
|
||||
| EvalError_AtomWrongArity Atom WrongArity
|
||||
|
||||
data WrongArity = WrongArity
|
||||
{ _wrongArity_relation :: RelId
|
||||
, _wrongArity_expected :: Int
|
||||
, _wrongArity_got :: Int
|
||||
}
|
||||
```
|
||||
|
||||
Detailed error types with context.
|
||||
|
||||
#### felix-db
|
||||
|
||||
```haskell
|
||||
data DatalogException
|
||||
= DuplicateRelationException Text
|
||||
| ArityMismatchException Text Int Int
|
||||
| RelationNotFoundException Text
|
||||
| VariableNotFoundException Text
|
||||
```
|
||||
|
||||
More error types but less context (just error message components).
|
||||
|
||||
**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
|
||||
|
||||
---
|
||||
|
||||
### Testing
|
||||
|
||||
| Aspect | toy-datalog | felix-db |
|
||||
|------------------|-------------------|-----------------|
|
||||
| Framework | Tasty + HUnit | Hspec |
|
||||
| Parser tests | ✅ Golden tests | ✅ Unit tests |
|
||||
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
|
||||
| Property tests | ❌ | ❌ |
|
||||
| Test files | 2 | 7 |
|
||||
|
||||
#### felix-db Test Examples
|
||||
|
||||
```haskell
|
||||
-- Reflexive relation
|
||||
"equiv(X, X) :- ."
|
||||
|
||||
-- Symmetric closure
|
||||
"equiv(Y, X) :- equiv(X, Y)."
|
||||
|
||||
-- Transitive closure
|
||||
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
|
||||
|
||||
-- Complex genealogical queries
|
||||
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
|
||||
```
|
||||
|
||||
**Winner: felix-db** - Much more comprehensive test coverage.
|
||||
|
||||
---
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
#### Indexing
|
||||
|
||||
| Operation | toy-datalog | felix-db |
|
||||
|--------------------------------|---------------------------------|----------------|
|
||||
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
|
||||
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
|
||||
| Insert tuple | O(k log n) (update k indices) | O(log m) |
|
||||
|
||||
**toy-datalog advantage:** Indexed lookups for atoms with bound positions.
|
||||
|
||||
#### Evaluation Complexity
|
||||
|
||||
Both have the same theoretical complexity for naive evaluation:
|
||||
|
||||
- **Time per iteration:** O(n^k × |rules| × r)
|
||||
- n = Herbrand universe size
|
||||
- k = max variables per rule
|
||||
- r = max body literals
|
||||
|
||||
However, toy-datalog's indexing provides practical speedup when atoms have constants.
|
||||
|
||||
#### Memory Usage
|
||||
|
||||
| Aspect | toy-datalog | felix-db |
|
||||
|----------------|----------------|----------------|
|
||||
| Tuple storage | Set [ConId] | Set [Constant] |
|
||||
| Index overhead | O(k × m) extra | None |
|
||||
| Rule storage | Separate map | In relations |
|
||||
|
||||
toy-datalog uses more memory due to indices, but gains query performance.
|
||||
|
||||
---
|
||||
|
||||
### Code Quality
|
||||
|
||||
#### toy-datalog
|
||||
|
||||
**Strengths:**
|
||||
|
||||
- Minimal, focused implementation
|
||||
- Good use of newtypes for type safety
|
||||
- Pretty printing support
|
||||
- Clean `HasConstants` typeclass
|
||||
|
||||
**Weaknesses:**
|
||||
|
||||
- No executable (commented out)
|
||||
- Evaluation tests disabled
|
||||
- No query interface
|
||||
|
||||
#### felix-db
|
||||
|
||||
**Strengths:**
|
||||
|
||||
- Comprehensive test suite
|
||||
- Type class abstraction (`DatalogDB`, `QueryEngine`)
|
||||
- Query syntax with projection
|
||||
- More complete feature set
|
||||
- PostgreSQL integration in progress
|
||||
|
||||
**Weaknesses:**
|
||||
|
||||
- No position-based indexing
|
||||
- Raw `Text` instead of newtypes
|
||||
- No pretty printing
|
||||
- More complex codebase
|
||||
|
||||
---
|
||||
|
||||
### Use Case Recommendations
|
||||
|
||||
#### Use toy-datalog when:
|
||||
|
||||
- Learning Datalog evaluation fundamentals
|
||||
- Need indexed lookups for performance
|
||||
- Want minimal, readable implementation
|
||||
- Building something from scratch
|
||||
|
||||
#### Use felix-db when:
|
||||
|
||||
- Need query syntax (`?-`)
|
||||
- Want comprehensive tests as reference
|
||||
- Planning to extend with database backends
|
||||
- Need negation syntax (even if not evaluated)
|
||||
|
||||
---
|
||||
|
||||
### Summary Table
|
||||
|
||||
| Category | Winner | Notes |
|
||||
|------------------|-------------|-----------------------------------|
|
||||
| Type Safety | toy-datalog | Newtypes prevent errors |
|
||||
| Features | felix-db | Queries, negation syntax, numbers |
|
||||
| Indexing | toy-datalog | Position-based index |
|
||||
| Testing | felix-db | 7 test files vs 1 active |
|
||||
| Extensibility | felix-db | Type classes for DB/QE |
|
||||
| Code Clarity | tie | Both well-organized |
|
||||
| Error Messages | toy-datalog | Full context in errors |
|
||||
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
|
||||
| Production Ready | neither | Both need work |
|
||||
|
||||
---
|
||||
|
||||
### Potential Cross-Pollination
|
||||
|
||||
#### toy-datalog could adopt from felix-db:
|
||||
|
||||
1. Query syntax (`?-` with projection)
|
||||
2. Type class abstraction for database backend
|
||||
3. Comprehensive test suite
|
||||
4. Negation parsing (for future implementation)
|
||||
5. Separate `Num` type for integers
|
||||
|
||||
#### felix-db could adopt from toy-datalog:
|
||||
|
||||
1. Position-based indexing for performance
|
||||
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
|
||||
3. Pretty printing (`Pretty` typeclass)
|
||||
4. Richer error types with context
|
||||
5. `HasConstants` typeclass pattern
|
||||
|
||||
---
|
||||
|
||||
### Conclusion
|
||||
|
||||
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
|
||||
They represent different design tradeoffs:
|
||||
|
||||
- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
|
||||
- **felix-db** prioritizes **features and extensibility** with a more complete implementation
|
||||
|
||||
Neither supports negation evaluation, aggregation, or semi-naive evaluation.
|
||||
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
|
||||
|
||||
## Changelog
|
||||
|
||||
* **Mar 4, 2026** -- The first version was created.
|
||||
Loading…
x
Reference in New Issue
Block a user