From 6922c3ef3b37fc966da566ff23cf632b1bdfa974 Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Tue, 3 Mar 2026 11:10:44 +0000 Subject: [PATCH] Add comparison of chase and naive fixpoint algorithms --- hassan/003-chase-vs-naive-fixpoint.md | 240 ++++++++++++++++++++++++++ 1 file changed, 240 insertions(+) create mode 100644 hassan/003-chase-vs-naive-fixpoint.md diff --git a/hassan/003-chase-vs-naive-fixpoint.md b/hassan/003-chase-vs-naive-fixpoint.md new file mode 100644 index 0000000..c366971 --- /dev/null +++ b/hassan/003-chase-vs-naive-fixpoint.md @@ -0,0 +1,240 @@ +## The Chase Algorithm vs Naive Fixpoint Evaluation + +### Introduction + +When working with logical rules over data, a fundamental question arises: given some initial facts and a set of rules, what are *all* the facts that +logically follow? Two approaches to this problem — **naive fixpoint evaluation** and **the chase algorithm** — share a common skeleton but differ +significantly in what they can handle. + +This article explains both, shows where they coincide, and clarifies where the chase goes beyond what naive fixpoint can do. + +--- + +### The Common Skeleton + +Both algorithms follow the same loop: + +1. Start with some initial facts. +2. Check which rules are not yet satisfied. +3. Add whatever is needed to satisfy them. +4. Repeat until nothing changes (a *fixpoint*). + +Both are **monotonic** — they only add information, never retract it. Both terminate when no rule is left unsatisfied. In the simplest cases, they +produce identical results. + +--- + +### Naive Fixpoint Evaluation + +Naive fixpoint evaluation is the standard engine behind Datalog and simple forward-chaining inference. It works like this: + +``` +facts = initial_facts +loop: + new_facts = {} + for each rule "if P then Q": + for each way P matches against facts: + if Q is not already in facts: + add Q to new_facts + if new_facts is empty: + stop — we've reached the fixpoint + else: + facts = facts ∪ new_facts + continue loop +``` + +#### Example: Transitive Closure + +Given edges A→B, B→C, C→D and the rules: + +- Every edge(x,y) implies path(x,y) +- path(x,y) and path(y,z) implies path(x,z) + +**Round 1:** Derive path(A,B), path(B,C), path(C,D) from edges. +**Round 2:** Derive path(A,C) from path(A,B) + path(B,C), and path(B,D) from path(B,C) + path(C,D). +**Round 3:** Derive path(A,D) from path(A,C) + path(C,D). +**Round 4:** No new facts. Stop. + +Result: 6 path tuples from 3 edges. Simple and effective. + +#### What naive fixpoint can handle + +Naive fixpoint works with **Horn clauses** — rules of the form "if (conjunction of facts) then (a single fact)." The domain of elements is fixed from +the start. No new elements are ever created. No branching, no choice. One run, one result. + +#### Limitations + +Consider these three situations where naive fixpoint gets stuck: + +**Situation 1 — Existentials:** A rule says "every node has a parent." Node `a` has no parent. What should naive fixpoint do? There is no specific +element it can derive as the parent. It simply cannot proceed. + +**Situation 2 — Disjunction:** A rule says "every paper is either accepted or rejected." Paper `p` is neither. Naive fixpoint has no mechanism for +choosing — it can't branch into "maybe accepted, maybe rejected." + +**Situation 3 — Equations:** Two rules derive that element `x` and element `y` must be the same thing. Naive fixpoint has no concept of merging +elements. + +--- + +### The Chase Algorithm + +The chase generalizes naive fixpoint to handle all three situations above. It operates on **geometric sequents** — rules of the form: + +``` +∀x. P(x) ⊢ ∃y. Q₁(x,y) ∨ Q₂(x,y) ∨ ... +``` + +Read: "for all x satisfying P, there exist witnesses y satisfying Q₁ or Q₂ or ..." + +The chase loop: + +``` +models = { initial_facts } +loop: + for each model in models: + for each rule "∀x. P ⊢ ∃y. Q₁ ∨ ... ∨ Qₙ": + for each way P matches in model: + if no Qᵢ is satisfied: + if n = 0: + discard model (contradiction) + if n = 1: + add fresh witnesses and facts for Q₁ + if n > 1: + branch: create n copies of model, + each satisfying one Qᵢ + if nothing changed in any model: + stop — all models are at fixpoint +``` + +#### Feature 1: Existential Witnesses + +When a rule says "there exists a parent," the chase **invents a fresh element** — a Skolem constant — to act as the parent. The domain grows. + +``` +Rule: ∀x:Person. ⊢ ∃y:Person. parent(x, y) +Facts: { alice:Person } +Chase: invent #1:Person, add parent(alice, #1) +``` + +Naive fixpoint cannot do this. It only derives facts about elements that already exist. + +#### Feature 2: Disjunctive Branching + +When a rule has "or" in the conclusion, the chase splits the computation. + +``` +Rule: ∀p:Paper. submitted(p) ⊢ accepted(p) ∨ rejected(p) +Facts: { paper1:Paper, submitted(paper1) } +Chase: branch into: + Model A: accepted(paper1) + Model B: rejected(paper1) +``` + +The chase then continues independently in each branch. The result is a *tree* of models — a set of jointly universal models, meaning every possible +model of the theory can be recovered from one of them. + +Naive fixpoint produces a single model. It has no branching mechanism. + +#### Feature 3: Equality and Merging + +When axioms include equations, the chase can discover that two elements must be identical and **merge** them — shrinking the domain. + +``` +Theory: Category with identity morphisms and composition +Rule: id(A) ∘ f = f (left unit law) +Chase: derives id_A ∘ id_A, recognizes it equals id_A, merges them +``` + +This is called **equality saturation** and prevents infinite growth in theories with equational axioms. Without it, composing identity morphisms would +generate an unbounded chain: id, id∘id, id∘id∘id, ... + +Naive fixpoint never merges elements. It can only grow. + +--- + +### Side-by-Side Comparison + +| Aspect | Naive Fixpoint | Chase | +|------------------|------------------------------|-----------------------------------------------------------------------------------------------------| +| **Domain** | Fixed from the start | Grows (fresh witnesses) and shrinks (merging) | +| **Rules** | Horn clauses: if P then Q | Geometric sequents: if P then ∃y. Q₁ ∨ Q₂ ∨ ... | +| **Disjunction** | Not supported | Branching into multiple models | +| **Existentials** | Not supported | Fresh element invention | +| **Equations** | Not supported | Equality saturation / element merging | +| **Output** | Single minimal model | Set of jointly universal models (a tree) | +| **Monotonicity** | Yes — facts only accumulate | Yes for facts, but domain can also shrink via merging | +| **Termination** | Always, on finite domains | Not guaranteed for satisfiable theories with infinite models; guaranteed for unsatisfiable theories | +| **Typical use** | Datalog, deductive databases | Geometric logic, formal specification, knowledge bases | + +--- + +### When They Coincide + +If your rules have **no existential quantifiers**, **no disjunctions**, and **no equations** in their conclusions — just "if these facts, then this +fact" — then the chase reduces exactly to naive fixpoint evaluation. The domain stays fixed, there's no branching, no merging. The two algorithms +produce identical results. + +This covers a wide range of practical cases: transitive closure, Datalog queries, simple inference rules, computing reflexive or symmetric closures of +relations. + +--- + +### A Worked Example Showing the Difference + +Consider a theory of supervisors: + +``` +Rule 1: ∀e:Employee. ⊢ ∃s:Employee. supervises(s, e) +Rule 2: ∀x,y,z. supervises(x,y) ∧ supervises(y,z) ⊢ senior_to(x,z) +``` + +Rule 1 says everyone has a supervisor. Rule 2 says supervision is transitive in terms of seniority. + +Starting with: `{ alice:Employee }` + +**Naive fixpoint:** Sees rule 1 is violated (alice has no supervisor) but cannot act. It does not know how to invent a supervisor. It stops +immediately, with an incomplete model. + +**Chase, round 1:** Invents `#1:Employee` and adds `supervises(#1, alice)`. Now rule 1 is violated for `#1` — they need a supervisor too. + +**Chase, round 2:** Invents `#2:Employee` and adds `supervises(#2, #1)`. Also derives `senior_to(#2, alice)` from rule 2. But now `#2` needs a +supervisor... + +**Chase, round 3, 4, ...:** This continues forever, creating #3, #4, #5, ... The theory has no finite model (every person needs a supervisor, who +needs a supervisor, ...), so the chase does not terminate. A practical implementation would impose a bound and stop after a configurable number of +steps. + +This example illustrates both the power and the risk of the chase: it can explore model spaces that naive fixpoint cannot even enter, but it may not +come back. + +--- + +### Performance Considerations + +**Naive fixpoint** is simple but can be wasteful — each round re-evaluates all rules against all facts, including combinations already tried. * +*Semi-naive evaluation** (standard in Datalog engines) optimizes this by only considering facts that are genuinely new in each round, dramatically +reducing redundant work. + +**The chase** faces the same redundancy problem plus additional challenges: managing branching (which can cause exponential blowup), efficiently +detecting which rules are violated (often done via relational algebra — joins for matching, set difference for finding violations), and handling +equality classes when merging elements. + +Real implementations like Rusty Razor use relational algebra internally, treating the model as a monotonically growing database and evaluating +geometric sequents as database queries. Geolog similarly uses tensor algebra for axiom checking. + +--- + +### Summary + +Naive fixpoint evaluation and the chase share the same iterate-until-stable structure. For simple rules (Horn clauses over a fixed domain), they +behave identically. The chase extends this to the full expressiveness of geometric logic by adding three capabilities: inventing fresh elements for +existential quantifiers, branching for disjunctions, and merging elements for equations. These additions make the chase dramatically more powerful — +it can explore model spaces that naive fixpoint cannot reach — but also introduce new challenges around termination and computational cost. + +The relationship is analogous to how regular expressions relate to context-free grammars, or how propositional logic relates to first-order logic: the +simpler tool handles the common cases efficiently, while the more general tool handles the hard cases at the cost of greater complexity. + +## Changelog + +* **Mar 3, 2026** -- The first version was created.