## felix-db: Project Internals

This document provides detailed technical documentation of the felix-db Datalog implementation.

### Overview

felix-db is an experimental Datalog implementation in Haskell. Datalog is a declarative logic programming language (a subset of Prolog) commonly used
for database queries, program analysis, and reasoning systems.

**Project Status:** The implementation is correct and fully functional for positive Datalog (without negation in rules).

### Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                         User Input                              │
│                    (Datalog text/files)                         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DatalogParser.hs                             │
│              (Megaparsec-based parser)                          │
│         Text → Statement (Fact | Rule | Query)                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DatalogDB.hs                               │
│            (Type class + core data structures)                  │
│         Relation, RelationRule, RuleElement                     │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────────────┐
│    InMemoryDB.hs        │     │         Rules.hs                │
│  (Concrete DB impl)     │     │    (Rule digestion logic)       │
│  relations + constants  │     │   digestHead, digestBody        │
└─────────────────────────┘     └─────────────────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       NaiveQE.hs                                │
│            (Naive Query Engine - bottom-up)                     │
│         Fixed-point computation of Herbrand model               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DigestedQuery.hs                             │
│              (Query preprocessing)                              │
│         Query → DigestedQuery with indexed variables            │
└─────────────────────────────────────────────────────────────────┘
```

### Module Details

#### 1. DatalogParser.hs

**Purpose:** Parses Datalog syntax into an Abstract Syntax Tree (AST).

**Parser Type:** Uses Megaparsec (`Parsec Void Text`).

##### Core Types

```haskell
data Term
  = Var Text   -- Variables: X, Person, Y1 (uppercase start)
  | Sym Text   -- Symbols: alice, "london", uk (lowercase/quoted)
  | Num Integer

data Literal = Literal
  { positive  :: Bool    -- True = positive, False = negated
  , predName  :: Text    -- Predicate name
  , arguments :: [Term]  -- Argument list
  }

data Statement
  = Fact Literal                -- p(a,b).
  | Rule Head [Literal]         -- p(X,Y) :- q(X,Z), r(Z,Y).
  | Query [Text] [Literal]      -- ?- p(X,Y).  or  ?- p(X,Y) → X,Y.
```

##### Syntax Support

| Feature           | Example                             | Notes                    |
|-------------------|-------------------------------------|--------------------------|
| Facts             | `parent("alice", "bob").`           | Ground atoms             |
| Rules             | `ancestor(X,Y) :- parent(X,Y).`     | Head :- body             |
| Queries           | `?- parent(X,Y).`                   | Returns all variables    |
| Projected queries | `?- edge(A,B), edge(B,C) → A,C.`    | Explicit output vars     |
| Negation          | `not member(X,L)` or `!member(X,L)` | Parsed but not evaluated |
| Comments          | `-- line` or `/* block */`          | Haskell-style            |
| Arrow variants    | `:-`, `→`, `->`                     | All accepted             |

##### Entry Points

```haskell
parseDatalog     :: Text -> Either (ParseErrorBundle Text Void) Statement
parseDatalogFile :: Text -> Either (ParseErrorBundle Text Void) [Statement]
```

---

#### 2. DatalogDB.hs

**Purpose:** Defines the core data structures and the `DatalogDB` type class.

##### Core Types

```haskell
data Relation = Relation
  { name   :: RelationId       -- Relation name (Text)
  , arity  :: Int              -- Number of arguments
  , tuples :: Set [Constant]   -- Known facts (extensional)
  , rules  :: [RelationRule]   -- Derivation rules (intensional)
  }

data RelationRule = RelationRule
  { headVariables  :: [Text]           -- Variables in scope
  , bodyElements   :: [RuleBodyElement] -- Body constraints
  , outputElements :: [RuleElement]     -- How to construct head tuple
  }

data RuleBodyElement = RuleBodyElement
  { subRelationId :: RelationId    -- Referenced relation
  , ruleElements  :: [RuleElement] -- How to map variables
  }

data RuleElement
  = RuleElementConstant Constant   -- Fixed value
  | RuleElementVariable Int        -- Index into headVariables
```

##### Type Class

```haskell
class DatalogDB db where
  emptyDB        :: db
  relationNames  :: db -> [Text]
  lookupRelation :: db -> Text -> Maybe Relation
  insertRelation :: db -> Relation -> db
  addConstants   :: db -> Set Constant -> db
  allConstants   :: db -> Set Constant  -- Herbrand universe
```

---

#### 3. InMemoryDB.hs

**Purpose:** Concrete implementation of `DatalogDB` using in-memory data structures.

```haskell
data InMemoryDB = InMemoryDB
  { relations :: Map RelationId Relation
  , constants :: Set Constant  -- The Herbrand universe
  }
```

##### Builder Functions

```haskell
-- Build DB from facts only
withFacts :: [Text] -> InMemoryDB

-- Build DB from facts and rules
withFactsAndRules :: [Text] -> [Text] -> InMemoryDB

-- Build DB with explicit extra constants
withFactsRulesAndConstants :: [Text] -> [Text] -> [Constant] -> InMemoryDB
```

**Example:**

```haskell
let db = withFactsAndRules
  [ "parent(\"alice\", \"bob\")."
  , "parent(\"bob\", \"carol\")."
  ]
  [ "ancestor(X,Y) :- parent(X,Y)."
  , "ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z)."
  ]
```

---

#### 4. Rules.hs

**Purpose:** Processes rules and builds the internal representation.

##### Key Functions

**`digestHead`** - Processes the rule head:

1. Extracts variables using `nub` (preserves first-occurrence order)
2. Creates `RuleElement` list mapping terms to indices/constants
3. Collects constants for the Herbrand universe

**`digestBody`** - Processes each body literal:

1. Extends variable list with new variables from body
2. Creates `BodyConstraint` linking to sub-relations
3. Maps terms to `RuleElement` (index or constant)

**Variable Indexing Example:**

```
Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

After digestHead:      variableNames = ["X", "Y"]
After digestBody[0]:   variableNames = ["X", "Y", "Z"]  (Z added)
After digestBody[1]:   variableNames = ["X", "Y", "Z"]  (no change)

Indices: X=0, Y=1, Z=2
```

**`addRule`** - Main entry point:

```haskell
addRule :: (DatalogDB db) => (Literal, [Literal]) -> db -> db
```

---

#### 5. NaiveQE.hs (Naive Query Engine)

**Purpose:** Implements naive bottom-up evaluation with fixed-point iteration.

```haskell
data NaiveQE db = NaiveQE
  { db       :: db
  , herbrand :: Map Text Relation  -- Computed model
  }
```

##### Algorithm: `computeHerbrand`

The naive evaluation algorithm:

```
1. Initialize Facts := all known facts from DB
2. Repeat:
   a. For each rule R in each relation:
      - Enumerate all variable assignments from Herbrand universe
      - For each assignment that satisfies all body constraints:
        - Compute the derived tuple
        - If tuple is new, add to Facts and set changed=True
   b. If no changes, terminate
3. Return final Facts (the minimal Herbrand model)
```

**Complexity:** O(n^k) per iteration where n = |constants|, k = |variables in rule|

##### Query Execution

```haskell
executeQuery :: (DatalogDB db) => NaiveQE db -> DigestedQuery -> QueryResults
```

1. Enumerate all assignments for sought variables
2. For each, check if any assignment to unsought variables satisfies all conditions
3. Return matching sought-variable tuples

---

#### 6. DigestedQuery.hs

**Purpose:** Preprocesses queries for efficient execution.

```haskell
data DigestedQuery = DigestedQuery
  { allBoundVariables   :: [Text]  -- All variables in query
  , numSoughtVariables  :: Int     -- Variables in output
  , conditions          :: [DigestedQueryCondition]
  }

data DigestedQueryCondition = DigestedQueryCondition
  { relationName :: Text
  , entries      :: [DigestedQueryEntry]
  }

data DigestedQueryEntry
  = DigestedQueryEntryConstant Constant
  | DigestedQueryEntryVariable Int  -- Index into allBoundVariables
```

**Query Forms:**

- `?- p(X,Y).` - Output all variables (X, Y)
- `?- edge(A,B), edge(B,C) → A,C.` - Output only A, C (B is existential)

---

#### 7. QueryEngine.hs

**Purpose:** Type class for query engines.

```haskell
class QueryEngine qe where
  queryEngine :: (DatalogDB db) => db -> qe db
  query       :: (DatalogDB db) => qe db -> Text -> Text
```

---

#### 8. Utility.hs

**Purpose:** Helper functions.

```haskell
-- Generate all possible maps from domain to codomain
allMaps :: (Ord a) => [a] -> [b] -> [Map.Map a b]
```

Used by `NaiveQE` to enumerate variable assignments.

---

### Data Flow Example

**Input:**

```datalog
parent("alice", "bob").
parent("bob", "carol").
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z).
?- ancestor(alice, X).
```

**Step 1: Parse** → `[Fact, Fact, Rule, Rule, Query]`

**Step 2: Build DB**

```
relations = {
  "parent" -> Relation {
    tuples = {["alice","bob"], ["bob","carol"]},
    rules = []
  },
  "ancestor" -> Relation {
    tuples = {},
    rules = [Rule1, Rule2]
  }
}
constants = {"alice", "bob", "carol"}
```

**Step 3: Compute Herbrand Model**

```
Iteration 1: Apply rules
  ancestor(alice,bob)  ← parent(alice,bob)
  ancestor(bob,carol)  ← parent(bob,carol)

Iteration 2: Apply rules again
  ancestor(alice,carol) ← parent(alice,bob), ancestor(bob,carol)

Iteration 3: No new facts → terminate
```

**Step 4: Execute Query**

```
?- ancestor(alice, X)
Result: bob, carol
```

---

### Correctness Assessment

| Component       | Status  | Notes                                |
|-----------------|---------|--------------------------------------|
| Parser          | Correct | Full Datalog syntax support          |
| Data Model      | Correct | Proper Relation/Rule representation  |
| Rule Processing | Correct | Correct variable indexing with `nub` |
| Query Engine    | Correct | Fixed-point terminates correctly     |
| Tests           | Pass    | Comprehensive coverage               |

#### Limitations

1. **No stratified negation** - Negation is parsed but ignored in evaluation
2. **Naive complexity** - Exponential in number of variables per rule
3. **No aggregation** - Standard Datalog without extensions
4. **No built-in predicates** - No arithmetic comparison, etc.

---

### Test Coverage

Tests are in `test/Test/Datalog/`:

- **DatalogParserSpec.hs** - Parser correctness
- **InMemoryDBSpec.hs** - Database operations
- **RulesSpec.hs** - Rule digestion
- **NaiveQESpec.hs** - Query evaluation
- **DigestedQuerySpec.hs** - Query preprocessing

**Notable Test Cases:**

- Reflexive relations: `equiv(X, X) :- .`
- Symmetric relations: `equiv(Y, X) :- equiv(X, Y).`
- Transitive closure: `equiv(X, Z) :- equiv(X, Y), equiv(Y, Z).`
- Full equivalence relations (combining all three)
- Genealogical queries (niece, sibling relationships)
- Poset ordering with constants

---

### Build & Run

```bash
cabal build          ## Build
cabal test           ## Run all tests
cabal repl           ## Interactive REPL

## Run specific tests
cabal test --test-option=--match="/NaiveQE/"
```

---

### Dependencies

- **megaparsec** - Parser combinators
- **containers** - Map, Set
- **text** - Text processing
- **hspec** - Testing framework

---

### Future Work

Potential enhancements:

1. Implement stratified negation
2. Add semi-naive evaluation for better performance
3. Built-in predicates (comparison, arithmetic)
4. Magic sets optimization
5. Persistent storage backend

## Changelog

* **Mar 4, 2026** -- The first version was created.