useful-notes/hassan/009-logic-primer.md

285 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Logic Primer
A reference for the core concepts of first-order logic and database theory.
---
## Building Blocks
### Constants
Specific known values — real things in your database.
```
Alice, Bob, Engineering, 42
```
### Variables
Placeholders for unknown values, usually written in uppercase or with `?`.
```
X, Y, ?person, ?dept
```
### Terms
A term is either a constant or a variable. Anything that can fill an argument slot in a predicate.
---
## Predicates and Atoms
A **predicate** is a named relation applied to a list of terms:
```
Employee(X, Y) -- predicate "Employee", arity 2
Likes(X, Y, Z) -- predicate "Likes", arity 3
Person(Alice) -- predicate "Person", arity 1
```
The number of arguments a predicate takes is its **arity**.
An **atom** is a predicate applied to specific terms. When all terms are constants it is called a **ground atom** — a concrete fact in the database:
```
Employee(Alice, Engineering) -- ground atom
```
---
## Rules
A **rule** is an if-then statement built from atoms, split into two parts:
```
body → head
```
| Part | Also Called | Meaning |
|:-----|:------------|:--------|
| Body | Antecedent, LHS | Conditions that must hold |
| Head | Consequent, RHS | What must be true if body holds |
Example:
```
Employee(X, Y), Department(Y) → ∃Z. ManagedBy(Y, Z)
```
Variables in the head that do not appear in the body are **existential variables** — they represent unknown values that must exist, but whose identity is not known.
---
## Quantifiers
| Symbol | Name | Meaning | Example |
|:-------|:-----|:--------|:--------|
| `∀` | Universal | "For all..." | `∀X. Person(X) → Mortal(X)` |
| `∃` | Existential | "There exists some..." | `∀X. Person(X) → ∃Y. MotherOf(X, Y)` |
---
## Connectives
| Symbol | Name | Meaning |
|:-------|:-----|:--------|
| `∧` | Conjunction | AND |
| `` | Disjunction | OR |
| `¬` | Negation | NOT |
| `→` | Implication | IF...THEN |
| `↔` | Biconditional | IF AND ONLY IF |
---
## Key Concepts
### Substitution
A mapping from variables to terms. Written as `σ`:
```
σ = { X → Alice, Y → Engineering }
Employee(X, Y) under σ = Employee(Alice, Engineering)
```
### Homomorphism
A substitution that maps one set of atoms into another, preserving structure. Used in the chase to check whether a rule head is already satisfied in the database.
### Ground Instance
An atom or rule with all variables replaced by constants — no variables remaining.
### Herbrand Universe
The set of all ground terms constructable from the constants and function symbols in a formula. Represents the "world" of all possible values.
### Skolem Term / Skolem Function
A named placeholder for an existentially quantified variable, constructed deterministically from the values that triggered it:
```
∀X. Person(X) → ∃Y. MotherOf(X, Y)
Skolemized: MotherOf(X, mother_of(X))
```
The Skolem term `mother_of(X)` means "the mother of X, whoever that is." Same input always produces the same term.
---
## Types of Dependencies (Rules)
| Type | Abbreviation | Form | Meaning |
|:-----|:-------------|:-----|:--------|
| Tuple-generating dependency | TGD | `body → ∃z. head` | If body holds, some new tuple must exist |
| Equality-generating dependency | EGD | `body → x = y` | If body holds, two values must be equal |
| Functional dependency | FD | Special EGD | A set of attributes uniquely determines another |
| Full dependency | — | TGD with no existentials | Head variables all appear in body |
---
## Rule Classes and Termination
| Rule Class | Existentials | Termination Guarantee | Notes |
|:-----------|:-------------|:----------------------|:------|
| Datalog | No | Always | Core of logic programming |
| Weakly acyclic | Yes | Yes | No cyclic value propagation through existentials |
| Guarded | Yes | No (but decidable) | Existential vars "guarded" by a body atom |
| Frontier-one | Yes | No (but decidable) | At most one frontier variable per rule |
| General TGDs | Yes | Undecidable | No restrictions |
---
## Datalog
Datalog is a logic programming language and the most important rule class for the chase. It is a restriction of first-order logic with no function symbols and no existential variables in rule heads.
### Syntax
A Datalog program consists of:
- **Facts** — ground atoms representing known data
- **Rules** — if-then statements deriving new facts from existing ones
```
% Facts
Employee(alice, engineering).
Employee(bob, marketing).
Department(engineering).
% Rules
WorksIn(X, D) :- Employee(X, D), Department(D).
Colleague(X, Y) :- WorksIn(X, D), WorksIn(Y, D), X ≠ Y.
```
### Key Properties
| Property | Value |
|:---------|:------|
| Existentials in head | No |
| Function symbols | No |
| Negation | Stratified only |
| Termination | Always |
| Data complexity | PTIME |
| Evaluation strategy | Bottom-up (forward chaining) or top-down (backward chaining) |
### Evaluation
Datalog is evaluated by computing the **least fixed point** — repeatedly applying all rules until no new facts are derived. This is exactly the chase with no existentials:
```
Iteration 0: { Employee(alice, engineering), Employee(bob, marketing), Department(engineering) }
Iteration 1: + WorksIn(alice, engineering)
Iteration 2: + Colleague(alice, bob), Colleague(bob, alice)
Iteration 3: nothing new → stop
```
### Datalog vs SQL
Datalog can express **recursive queries** naturally, which SQL cannot without special extensions (like recursive CTEs). For example, computing transitive closure (all ancestors of a person) is trivial in Datalog:
```
Ancestor(X, Y) :- Parent(X, Y).
Ancestor(X, Y) :- Parent(X, Z), Ancestor(Z, Y).
```
### Datalog and the Chase
Pure Datalog evaluation is a restricted chase with no existential variables — all chase variants behave identically. The fixed point is always reached in at most O(n^k) steps.
---
## Geometric Logic
Geometric logic is a fragment of first-order logic that is particularly well-suited to the chase. It is the theoretical foundation for most modern chase-based reasoning engines.
### Definition
A formula is **geometric** if it uses only:
| Allowed | Not Allowed |
|:--------|:-----------|
| Conjunction `∧` | Universal quantifier `∀` in head |
| Disjunction `` | Negation `¬` |
| Existential quantifier `∃` | Implication `→` in head |
| Equality `=` | Infinite conjunctions in head |
A **geometric sequent** (rule) has the form:
```
φ ⊢ ψ
where φ and ψ are geometric formulas
```
Which is shorthand for: `∀x. φ(x) → ψ(x)`
### Examples
```
-- Every employee works in some department
Employee(X) ⊢ ∃Y. WorksIn(X, Y)
-- Every department has a manager or is a sub-department
Department(X) ⊢ (∃Y. ManagedBy(X, Y)) (∃Z. SubDeptOf(X, Z))
-- If two things are equal and one is an employee, so is the other
Employee(X) ∧ X = Y ⊢ Employee(Y)
```
### Why Geometric Logic Matters
Geometric logic has a special property: if a geometric formula is true in a model, it remains true in any **extension** of that model (adding more facts never makes it false). This is called **monotonicity** and is exactly what makes the chase work — the chase only adds facts, never removes them.
This property means:
- The chase directly computes models of geometric theories
- Every geometric theory has a **canonical model** — the one the chase builds
- Query answers over the chase result are guaranteed correct for any model of the theory
### Geometric Logic vs Other Fragments
```
First-Order Logic
└── Geometric Logic (∧, , ∃ only — monotone)
└── Coherent Logic (same but finitary — finite disjunctions only)
└── Datalog (no existentials in head, no disjunction)
└── Horn Clauses (at most one atom in head)
```
### Connection to the Chase
Geometric sequents map directly onto TGDs (tuple-generating dependencies):
```
Geometric sequent: Employee(X) ⊢ ∃Y. WorksIn(X, Y)
TGD: Employee(X) → ∃Y. WorksIn(X, Y)
```
The chase is the standard procedure for building models of geometric theories. Given a set of geometric axioms and an initial database, the chase constructs the minimal model satisfying all axioms — which is exactly the universal solution used for query answering.
---
## Logic Hierarchy
```
Propositional Logic -- no variables, just true/false statements
↓ (add variables and quantifiers)
First-Order Logic (FOL) -- variables, predicates, ∀, ∃
↓ (restrict to certain rule forms)
Geometric Logic -- only ∧, , ∃ — no ∀ in head, no ¬
↓ (remove existentials from head)
Datalog -- function-free, no existentials in head
↓ (remove disjunction)
Horn Clauses -- at most one atom in head
```
---
## Changelog
* **Mar 6, 2026** -- First version created.