453 lines
14 KiB
Markdown
453 lines
14 KiB
Markdown
## Comparison: toy-datalog vs felix-db
|
||
|
||
This document compares two Haskell implementations of Datalog interpreters.
|
||
|
||
---
|
||
|
||
### Overview
|
||
|
||
| Aspect | toy-datalog | felix-db |
|
||
|-------------------|-----------------------------|---------------------------|
|
||
| **Author** | Cale Gibbard | (felix-db team) |
|
||
| **Version** | 0.1.0.0 | 0.1.0.0 |
|
||
| **Language** | Haskell (Haskell2010) | Haskell (GHC2024) |
|
||
| **License** | BSD-2-Clause | (not specified) |
|
||
| **Lines of Code** | ~300 | ~700 |
|
||
| **Primary Focus** | Minimal core implementation | More complete feature set |
|
||
|
||
---
|
||
|
||
### Architecture Comparison
|
||
|
||
#### toy-datalog
|
||
|
||
```
|
||
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
|
||
```
|
||
|
||
Simple linear pipeline with 4 modules:
|
||
|
||
- `Syntax.hs` - AST definitions
|
||
- `Parser.hs` - Megaparsec parser
|
||
- `Eval.hs` - Evaluation engine
|
||
- `IO.hs` - File utilities
|
||
|
||
#### felix-db
|
||
|
||
```
|
||
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
|
||
↓
|
||
DigestedQuery
|
||
```
|
||
|
||
More modular architecture with 8+ modules:
|
||
|
||
- `DatalogParser.hs` - Parser
|
||
- `DatalogDB.hs` - Database type class
|
||
- `InMemoryDB.hs` - Concrete implementation
|
||
- `Rules.hs` - Rule processing/digestion
|
||
- `NaiveQE.hs` - Query engine
|
||
- `DigestedQuery.hs` - Query preprocessing
|
||
- `QueryEngine.hs` - Query engine type class
|
||
- `Utility.hs` - Helper functions
|
||
|
||
**Winner: felix-db** - Better separation of concerns and extensibility via type classes.
|
||
|
||
---
|
||
|
||
### Syntax & AST
|
||
|
||
#### Term Representation
|
||
|
||
| Feature | toy-datalog | felix-db |
|
||
|-----------|---------------------------------|-------------------------------|
|
||
| Variables | `Var VarId` (uppercase) | `Var Text` (uppercase) |
|
||
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
|
||
| Numbers | Parsed as constants | `Num Integer` (separate type) |
|
||
| Newtypes | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`) |
|
||
|
||
```haskell
|
||
-- toy-datalog
|
||
data Term = Con ConId | Var VarId
|
||
|
||
-- felix-db
|
||
data Term = Var Text | Sym Text | Num Integer
|
||
```
|
||
|
||
**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.
|
||
|
||
**felix-db advantage:** Explicit `Num` type for numeric constants.
|
||
|
||
#### Rule Representation
|
||
|
||
```haskell
|
||
-- toy-datalog: Simple, direct
|
||
data Rule = Atom :- [Atom]
|
||
|
||
-- felix-db: Separate head type, supports negation in body
|
||
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
|
||
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
|
||
```
|
||
|
||
**felix-db advantage:** Supports negation syntax (`positive :: Bool`).
|
||
|
||
---
|
||
|
||
### Parser Comparison
|
||
|
||
| Feature | toy-datalog | felix-db |
|
||
|----------------|---------------|-----------------------------|
|
||
| Library | Megaparsec | Megaparsec |
|
||
| Line comments | `--` | `--` |
|
||
| Block comments | `{- -}` | `/* */` |
|
||
| Rule arrow | `:-` only | `:-`, `→`, `->` |
|
||
| Negation | Not supported | `not` or `!` prefix |
|
||
| Queries | Not supported | `?-` syntax with projection |
|
||
| Quoted strings | No | Yes (`"alice"`) |
|
||
| Numbers | As constants | As separate `Num` type |
|
||
|
||
#### Example Syntax
|
||
|
||
```datalog
|
||
-- toy-datalog
|
||
parent(xerces, brooke).
|
||
ancestor(X, Y) :- parent(X, Y).
|
||
|
||
-- felix-db (additional features)
|
||
parent("alice", "bob").
|
||
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
|
||
?- ancestor(A, B), ancestor(B, C) → A, C.
|
||
```
|
||
|
||
**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.
|
||
|
||
---
|
||
|
||
### Data Structures
|
||
|
||
#### Relation Storage
|
||
|
||
```haskell
|
||
-- toy-datalog: Indexed storage
|
||
data Relation = Relation
|
||
{ _relation_arity :: Int
|
||
, _relation_members :: Set [ConId]
|
||
, _relation_indices :: Map (Int, ConId) (Set [ConId]) -- Position-based index
|
||
}
|
||
|
||
-- felix-db: Simple storage + rules
|
||
data Relation = Relation
|
||
{ name :: RelationId
|
||
, arity :: Int
|
||
, tuples :: Set [Constant] -- No index
|
||
, rules :: [RelationRule] -- Rules stored with relation
|
||
}
|
||
```
|
||
|
||
**toy-datalog advantage:** Position-based indexing for efficient lookups.
|
||
|
||
**felix-db advantage:** Rules stored directly with relations (cleaner organization).
|
||
|
||
#### Database Structure
|
||
|
||
```haskell
|
||
-- toy-datalog
|
||
data Database = Database
|
||
{ _database_universe :: Set ConId
|
||
, _database_relations :: Map RelId Relation
|
||
, _database_rules :: Map RelId [Rule] -- Rules separate from relations
|
||
}
|
||
|
||
-- felix-db
|
||
data InMemoryDB = InMemoryDB
|
||
{ relations :: Map RelationId Relation -- Rules inside Relation
|
||
, constants :: Set Constant
|
||
}
|
||
```
|
||
|
||
Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.
|
||
|
||
---
|
||
|
||
### Evaluation Strategy
|
||
|
||
Both use **naive bottom-up evaluation** with fixed-point iteration.
|
||
|
||
#### Algorithm Comparison
|
||
|
||
| Aspect | toy-datalog | felix-db |
|
||
|------------------|--------------------------|----------------------------|
|
||
| Strategy | Naive bottom-up | Naive bottom-up |
|
||
| Termination | Fixed-point check | Fixed-point check |
|
||
| Variable binding | `Map VarId ConId` | List indexed by position |
|
||
| Join operation | `><` operator (set join) | Cartesian product + filter |
|
||
| Index usage | Yes (position-based) | No |
|
||
|
||
#### toy-datalog Evaluation
|
||
|
||
```haskell
|
||
-- Key functions
|
||
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
|
||
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
|
||
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
|
||
extendFixedpointDb :: Database -> Either EvalError Database
|
||
```
|
||
|
||
Uses indices to find candidate tuples, then intersects results for bound positions.
|
||
|
||
#### felix-db Evaluation
|
||
|
||
```haskell
|
||
-- Key functions
|
||
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
|
||
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
|
||
```
|
||
|
||
Generates all possible variable assignments from Herbrand universe, then filters.
|
||
|
||
#### Variable Binding Approach
|
||
|
||
```haskell
|
||
-- toy-datalog: Named bindings (Map)
|
||
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
|
||
-- Checks consistency: same variable must bind to same value
|
||
|
||
-- felix-db: Positional bindings (List + Index)
|
||
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
|
||
-- Variables replaced with de Bruijn-style indices
|
||
```
|
||
|
||
**toy-datalog:** More intuitive, variable names preserved throughout.
|
||
|
||
**felix-db:** More efficient (integer indexing), but loses variable names.
|
||
|
||
---
|
||
|
||
### Feature Comparison
|
||
|
||
| Feature | toy-datalog | felix-db |
|
||
|------------------------------|-------------|------------------|
|
||
| Facts | ✅ | ✅ |
|
||
| Rules | ✅ | ✅ |
|
||
| Recursive rules | ✅ | ✅ |
|
||
| Multiple rules per predicate | ✅ | ✅ |
|
||
| Arity checking | ✅ | ✅ |
|
||
| Fixed-point iteration | ✅ | ✅ |
|
||
| Position-based indexing | ✅ | ❌ |
|
||
| Interactive queries | ❌ | ✅ (`?-` syntax) |
|
||
| Query projection | ❌ | ✅ (`→ X, Y`) |
|
||
| Negation (parsing) | ❌ | ✅ |
|
||
| Negation (evaluation) | ❌ | ❌ |
|
||
| Numbers as distinct type | ❌ | ✅ |
|
||
| Quoted strings | ❌ | ✅ |
|
||
| Pretty printing | ✅ | ❌ |
|
||
| Database type class | ❌ | ✅ |
|
||
| Query engine type class | ❌ | ✅ |
|
||
| PostgreSQL integration | ❌ | 🔄 (in progress) |
|
||
|
||
---
|
||
|
||
### Error Handling
|
||
|
||
#### toy-datalog
|
||
|
||
```haskell
|
||
data EvalError =
|
||
EvalError_RuleWrongArity Rule WrongArity
|
||
| EvalError_AtomWrongArity Atom WrongArity
|
||
|
||
data WrongArity = WrongArity
|
||
{ _wrongArity_relation :: RelId
|
||
, _wrongArity_expected :: Int
|
||
, _wrongArity_got :: Int
|
||
}
|
||
```
|
||
|
||
Detailed error types with context.
|
||
|
||
#### felix-db
|
||
|
||
```haskell
|
||
data DatalogException
|
||
= DuplicateRelationException Text
|
||
| ArityMismatchException Text Int Int
|
||
| RelationNotFoundException Text
|
||
| VariableNotFoundException Text
|
||
```
|
||
|
||
More error types but less context (just error message components).
|
||
|
||
**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.
|
||
|
||
---
|
||
|
||
### Testing
|
||
|
||
| Aspect | toy-datalog | felix-db |
|
||
|------------------|-------------------|-----------------|
|
||
| Framework | Tasty + HUnit | Hspec |
|
||
| Parser tests | ✅ Golden tests | ✅ Unit tests |
|
||
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
|
||
| Property tests | ❌ | ❌ |
|
||
| Test files | 2 | 7 |
|
||
|
||
#### felix-db Test Examples
|
||
|
||
```haskell
|
||
-- Reflexive relation
|
||
"equiv(X, X) :- ."
|
||
|
||
-- Symmetric closure
|
||
"equiv(Y, X) :- equiv(X, Y)."
|
||
|
||
-- Transitive closure
|
||
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."
|
||
|
||
-- Complex genealogical queries
|
||
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
|
||
```
|
||
|
||
**Winner: felix-db** - Much more comprehensive test coverage.
|
||
|
||
---
|
||
|
||
### Performance Characteristics
|
||
|
||
#### Indexing
|
||
|
||
| Operation | toy-datalog | felix-db |
|
||
|--------------------------------|---------------------------------|----------------|
|
||
| Lookup atom with constants | O(log n) via index intersection | O(m) full scan |
|
||
| Lookup atom with all variables | O(m) full scan | O(m) full scan |
|
||
| Insert tuple | O(k log n) (update k indices) | O(log m) |
|
||
|
||
**toy-datalog advantage:** Indexed lookups for atoms with bound positions.
|
||
|
||
#### Evaluation Complexity
|
||
|
||
Both have the same theoretical complexity for naive evaluation:
|
||
|
||
- **Time per iteration:** O(n^k × |rules| × r)
|
||
- n = Herbrand universe size
|
||
- k = max variables per rule
|
||
- r = max body literals
|
||
|
||
However, toy-datalog's indexing provides practical speedup when atoms have constants.
|
||
|
||
#### Memory Usage
|
||
|
||
| Aspect | toy-datalog | felix-db |
|
||
|----------------|----------------|----------------|
|
||
| Tuple storage | Set [ConId] | Set [Constant] |
|
||
| Index overhead | O(k × m) extra | None |
|
||
| Rule storage | Separate map | In relations |
|
||
|
||
toy-datalog uses more memory due to indices, but gains query performance.
|
||
|
||
---
|
||
|
||
### Code Quality
|
||
|
||
#### toy-datalog
|
||
|
||
**Strengths:**
|
||
|
||
- Minimal, focused implementation
|
||
- Good use of newtypes for type safety
|
||
- Pretty printing support
|
||
- Clean `HasConstants` typeclass
|
||
|
||
**Weaknesses:**
|
||
|
||
- No executable (commented out)
|
||
- Evaluation tests disabled
|
||
- No query interface
|
||
|
||
#### felix-db
|
||
|
||
**Strengths:**
|
||
|
||
- Comprehensive test suite
|
||
- Type class abstraction (`DatalogDB`, `QueryEngine`)
|
||
- Query syntax with projection
|
||
- More complete feature set
|
||
- PostgreSQL integration in progress
|
||
|
||
**Weaknesses:**
|
||
|
||
- No position-based indexing
|
||
- Raw `Text` instead of newtypes
|
||
- No pretty printing
|
||
- More complex codebase
|
||
|
||
---
|
||
|
||
### Use Case Recommendations
|
||
|
||
#### Use toy-datalog when:
|
||
|
||
- Learning Datalog evaluation fundamentals
|
||
- Need indexed lookups for performance
|
||
- Want minimal, readable implementation
|
||
- Building something from scratch
|
||
|
||
#### Use felix-db when:
|
||
|
||
- Need query syntax (`?-`)
|
||
- Want comprehensive tests as reference
|
||
- Planning to extend with database backends
|
||
- Need negation syntax (even if not evaluated)
|
||
|
||
---
|
||
|
||
### Summary Table
|
||
|
||
| Category | Winner | Notes |
|
||
|------------------|-------------|-----------------------------------|
|
||
| Type Safety | toy-datalog | Newtypes prevent errors |
|
||
| Features | felix-db | Queries, negation syntax, numbers |
|
||
| Indexing | toy-datalog | Position-based index |
|
||
| Testing | felix-db | 7 test files vs 1 active |
|
||
| Extensibility | felix-db | Type classes for DB/QE |
|
||
| Code Clarity | tie | Both well-organized |
|
||
| Error Messages | toy-datalog | Full context in errors |
|
||
| Documentation | toy-datalog | NOTES.md, CHANGELOG |
|
||
| Production Ready | neither | Both need work |
|
||
|
||
---
|
||
|
||
### Potential Cross-Pollination
|
||
|
||
#### toy-datalog could adopt from felix-db:
|
||
|
||
1. Query syntax (`?-` with projection)
|
||
2. Type class abstraction for database backend
|
||
3. Comprehensive test suite
|
||
4. Negation parsing (for future implementation)
|
||
5. Separate `Num` type for integers
|
||
|
||
#### felix-db could adopt from toy-datalog:
|
||
|
||
1. Position-based indexing for performance
|
||
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
|
||
3. Pretty printing (`Pretty` typeclass)
|
||
4. Richer error types with context
|
||
5. `HasConstants` typeclass pattern
|
||
|
||
---
|
||
|
||
### Conclusion
|
||
|
||
Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
|
||
They represent different design tradeoffs:
|
||
|
||
- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
|
||
- **felix-db** prioritizes **features and extensibility** with a more complete implementation
|
||
|
||
Neither supports negation evaluation, aggregation, or semi-naive evaluation.
|
||
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.
|
||
|
||
## Changelog
|
||
|
||
* **Mar 4, 2026** -- The first version was created.
|