useful-notes/hassan/007-cales-vs-felixs.md

## Comparison: toy-datalog vs felix-db

This document compares two Haskell implementations of Datalog interpreters.

---

### Overview

| Aspect            | toy-datalog                 | felix-db                  |
|-------------------|-----------------------------|---------------------------|
| **Author**        | Cale Gibbard                | (felix-db team)           |
| **Version**       | 0.1.0.0                     | 0.1.0.0                   |
| **Language**      | Haskell (Haskell2010)       | Haskell (GHC2024)         |
| **License**       | BSD-2-Clause                | (not specified)           |
| **Lines of Code** | ~300                        | ~700                      |
| **Primary Focus** | Minimal core implementation | More complete feature set |

---

### Architecture Comparison

#### toy-datalog

```
Program Text → Parser → AST → Database → Eval (fixed-point) → Results
```

Simple linear pipeline with 4 modules:

- `Syntax.hs` - AST definitions
- `Parser.hs` - Megaparsec parser
- `Eval.hs` - Evaluation engine
- `IO.hs` - File utilities

#### felix-db

```
Program Text → Parser → AST → Rules (digestion) → Database → NaiveQE → Results
                                    ↓
                              DigestedQuery
```

More modular architecture with 8+ modules:

- `DatalogParser.hs` - Parser
- `DatalogDB.hs` - Database type class
- `InMemoryDB.hs` - Concrete implementation
- `Rules.hs` - Rule processing/digestion
- `NaiveQE.hs` - Query engine
- `DigestedQuery.hs` - Query preprocessing
- `QueryEngine.hs` - Query engine type class
- `Utility.hs` - Helper functions

**Winner: felix-db** - Better separation of concerns and extensibility via type classes.

---

### Syntax & AST

#### Term Representation

| Feature   | toy-datalog                     | felix-db                      |
|-----------|---------------------------------|-------------------------------|
| Variables | `Var VarId` (uppercase)         | `Var Text` (uppercase)        |
| Constants | `Con ConId` (lowercase/numeric) | `Sym Text` (lowercase/quoted) |
| Numbers   | Parsed as constants             | `Num Integer` (separate type) |
| Newtypes  | Yes (`ConId`, `VarId`, `RelId`) | No (raw `Text`)               |

```haskell
-- toy-datalog
data Term = Con ConId | Var VarId

-- felix-db
data Term = Var Text | Sym Text | Num Integer
```

**toy-datalog advantage:** Newtypes prevent mixing up constants/variables/relations at compile time.

**felix-db advantage:** Explicit `Num` type for numeric constants.

#### Rule Representation

```haskell
-- toy-datalog: Simple, direct
data Rule = Atom :- [Atom]

-- felix-db: Separate head type, supports negation in body
data Statement = Fact Literal | Rule Head [Literal] | Query [Text] [Literal]
data Literal = Literal { positive :: Bool, predName :: Text, arguments :: [Term] }
```

**felix-db advantage:** Supports negation syntax (`positive :: Bool`).

---

### Parser Comparison

| Feature        | toy-datalog   | felix-db                    |
|----------------|---------------|-----------------------------|
| Library        | Megaparsec    | Megaparsec                  |
| Line comments  | `--`          | `--`                        |
| Block comments | `{- -}`       | `/* */`                     |
| Rule arrow     | `:-` only     | `:-`, `→`, `->`             |
| Negation       | Not supported | `not` or `!` prefix         |
| Queries        | Not supported | `?-` syntax with projection |
| Quoted strings | No            | Yes (`"alice"`)             |
| Numbers        | As constants  | As separate `Num` type      |

#### Example Syntax

```datalog
-- toy-datalog
parent(xerces, brooke).
ancestor(X, Y) :- parent(X, Y).

-- felix-db (additional features)
parent("alice", "bob").
ancestor(X, Y) :- parent(X, Y), not sibling(X, Y).
?- ancestor(A, B), ancestor(B, C) → A, C.
```

**Winner: felix-db** - More complete syntax with queries, negation, and quoted strings.

---

### Data Structures

#### Relation Storage

```haskell
-- toy-datalog: Indexed storage
data Relation = Relation
  { _relation_arity   :: Int
  , _relation_members :: Set [ConId]
  , _relation_indices :: Map (Int, ConId) (Set [ConId])  -- Position-based index
  }

-- felix-db: Simple storage + rules
data Relation = Relation
  { name   :: RelationId
  , arity  :: Int
  , tuples :: Set [Constant]  -- No index
  , rules  :: [RelationRule]  -- Rules stored with relation
  }
```

**toy-datalog advantage:** Position-based indexing for efficient lookups.

**felix-db advantage:** Rules stored directly with relations (cleaner organization).

#### Database Structure

```haskell
-- toy-datalog
data Database = Database
  { _database_universe  :: Set ConId
  , _database_relations :: Map RelId Relation
  , _database_rules     :: Map RelId [Rule]  -- Rules separate from relations
  }

-- felix-db
data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation  -- Rules inside Relation
  , constants :: Set Constant
  }
```

Both track the Herbrand universe (all constants). toy-datalog stores rules separately; felix-db stores them with relations.

---

### Evaluation Strategy

Both use **naive bottom-up evaluation** with fixed-point iteration.

#### Algorithm Comparison

| Aspect           | toy-datalog              | felix-db                   |
|------------------|--------------------------|----------------------------|
| Strategy         | Naive bottom-up          | Naive bottom-up            |
| Termination      | Fixed-point check        | Fixed-point check          |
| Variable binding | `Map VarId ConId`        | List indexed by position   |
| Join operation   | `><` operator (set join) | Cartesian product + filter |
| Index usage      | Yes (position-based)     | No                         |

#### toy-datalog Evaluation

```haskell
-- Key functions
evalAtomDb :: Database -> Atom -> Either EvalError (Set (Map VarId ConId))
evalConjunction :: Database -> [Atom] -> Either EvalError (Set (Map VarId ConId))
immediateConsequences :: Database -> Rule -> Either EvalError (Set [ConId])
extendFixedpointDb :: Database -> Either EvalError Database
```

Uses indices to find candidate tuples, then intersects results for bound positions.

#### felix-db Evaluation

```haskell
-- Key functions
computeHerbrand :: (DatalogDB db) => db -> Map Text Relation
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> [[Constant]]
```

Generates all possible variable assignments from Herbrand universe, then filters.

#### Variable Binding Approach

```haskell
-- toy-datalog: Named bindings (Map)
mkBinding :: [Term] -> [ConId] -> Map VarId ConId -> Maybe (Map VarId ConId)
-- Checks consistency: same variable must bind to same value

-- felix-db: Positional bindings (List + Index)
data RuleElement = RuleElementConstant Constant | RuleElementVariable Int
-- Variables replaced with de Bruijn-style indices
```

**toy-datalog:** More intuitive, variable names preserved throughout.

**felix-db:** More efficient (integer indexing), but loses variable names.

---

### Feature Comparison

| Feature                      | toy-datalog | felix-db         |
|------------------------------|-------------|------------------|
| Facts                        | ✅           | ✅                |
| Rules                        | ✅           | ✅                |
| Recursive rules              | ✅           | ✅                |
| Multiple rules per predicate | ✅           | ✅                |
| Arity checking               | ✅           | ✅                |
| Fixed-point iteration        | ✅           | ✅                |
| Position-based indexing      | ✅           | ❌                |
| Interactive queries          | ❌           | ✅ (`?-` syntax)  |
| Query projection             | ❌           | ✅ (`→ X, Y`)     |
| Negation (parsing)           | ❌           | ✅                |
| Negation (evaluation)        | ❌           | ❌                |
| Numbers as distinct type     | ❌           | ✅                |
| Quoted strings               | ❌           | ✅                |
| Pretty printing              | ✅           | ❌                |
| Database type class          | ❌           | ✅                |
| Query engine type class      | ❌           | ✅                |
| PostgreSQL integration       | ❌           | 🔄 (in progress) |

---

### Error Handling

#### toy-datalog

```haskell
data EvalError =
    EvalError_RuleWrongArity Rule WrongArity
  | EvalError_AtomWrongArity Atom WrongArity

data WrongArity = WrongArity
  { _wrongArity_relation :: RelId
  , _wrongArity_expected :: Int
  , _wrongArity_got :: Int
  }
```

Detailed error types with context.

#### felix-db

```haskell
data DatalogException
  = DuplicateRelationException Text
  | ArityMismatchException Text Int Int
  | RelationNotFoundException Text
  | VariableNotFoundException Text
```

More error types but less context (just error message components).

**Winner: toy-datalog** - Errors include the full rule/atom that caused the problem.

---

### Testing

| Aspect           | toy-datalog       | felix-db        |
|------------------|-------------------|-----------------|
| Framework        | Tasty + HUnit     | Hspec           |
| Parser tests     | ✅ Golden tests    | ✅ Unit tests    |
| Evaluation tests | ❌ (commented out) | ✅ Comprehensive |
| Property tests   | ❌                 | ❌               |
| Test files       | 2                 | 7               |

#### felix-db Test Examples

```haskell
-- Reflexive relation
"equiv(X, X) :- ."

-- Symmetric closure
"equiv(Y, X) :- equiv(X, Y)."

-- Transitive closure
"equiv(X, Z) :- equiv(X, Y), equiv(Y, Z)."

-- Complex genealogical queries
"niece(X, Y) :- parent(Z, X), sibling(Z, Y), female(X)."
```

**Winner: felix-db** - Much more comprehensive test coverage.

---

### Performance Characteristics

#### Indexing

| Operation                      | toy-datalog                     | felix-db       |
|--------------------------------|---------------------------------|----------------|
| Lookup atom with constants     | O(log n) via index intersection | O(m) full scan |
| Lookup atom with all variables | O(m) full scan                  | O(m) full scan |
| Insert tuple                   | O(k log n) (update k indices)   | O(log m)       |

**toy-datalog advantage:** Indexed lookups for atoms with bound positions.

#### Evaluation Complexity

Both have the same theoretical complexity for naive evaluation:

- **Time per iteration:** O(n^k × |rules| × r)
    - n = Herbrand universe size
    - k = max variables per rule
    - r = max body literals

However, toy-datalog's indexing provides practical speedup when atoms have constants.

#### Memory Usage

| Aspect         | toy-datalog    | felix-db       |
|----------------|----------------|----------------|
| Tuple storage  | Set [ConId]    | Set [Constant] |
| Index overhead | O(k × m) extra | None           |
| Rule storage   | Separate map   | In relations   |

toy-datalog uses more memory due to indices, but gains query performance.

---

### Code Quality

#### toy-datalog

**Strengths:**

- Minimal, focused implementation
- Good use of newtypes for type safety
- Pretty printing support
- Clean `HasConstants` typeclass

**Weaknesses:**

- No executable (commented out)
- Evaluation tests disabled
- No query interface

#### felix-db

**Strengths:**

- Comprehensive test suite
- Type class abstraction (`DatalogDB`, `QueryEngine`)
- Query syntax with projection
- More complete feature set
- PostgreSQL integration in progress

**Weaknesses:**

- No position-based indexing
- Raw `Text` instead of newtypes
- No pretty printing
- More complex codebase

---

### Use Case Recommendations

#### Use toy-datalog when:

- Learning Datalog evaluation fundamentals
- Need indexed lookups for performance
- Want minimal, readable implementation
- Building something from scratch

#### Use felix-db when:

- Need query syntax (`?-`)
- Want comprehensive tests as reference
- Planning to extend with database backends
- Need negation syntax (even if not evaluated)

---

### Summary Table

| Category         | Winner      | Notes                             |
|------------------|-------------|-----------------------------------|
| Type Safety      | toy-datalog | Newtypes prevent errors           |
| Features         | felix-db    | Queries, negation syntax, numbers |
| Indexing         | toy-datalog | Position-based index              |
| Testing          | felix-db    | 7 test files vs 1 active          |
| Extensibility    | felix-db    | Type classes for DB/QE            |
| Code Clarity     | tie         | Both well-organized               |
| Error Messages   | toy-datalog | Full context in errors            |
| Documentation    | toy-datalog | NOTES.md, CHANGELOG               |
| Production Ready | neither     | Both need work                    |

---

### Potential Cross-Pollination

#### toy-datalog could adopt from felix-db:

1. Query syntax (`?-` with projection)
2. Type class abstraction for database backend
3. Comprehensive test suite
4. Negation parsing (for future implementation)
5. Separate `Num` type for integers

#### felix-db could adopt from toy-datalog:

1. Position-based indexing for performance
2. Newtypes (`ConId`, `VarId`, `RelId`) for type safety
3. Pretty printing (`Pretty` typeclass)
4. Richer error types with context
5. `HasConstants` typeclass pattern

---

### Conclusion

Both implementations correctly handle core positive Datalog with naive bottom-up evaluation.
They represent different design tradeoffs:

- **toy-datalog** prioritizes **type safety and indexing** with a minimal codebase
- **felix-db** prioritizes **features and extensibility** with a more complete implementation

Neither supports negation evaluation, aggregation, or semi-naive evaluation.
For a production system, combining toy-datalog's indexing with felix-db's feature set and test coverage would be ideal.

## Changelog

* **Mar 4, 2026** -- The first version was created.