From cf4c522ff35a02a87aea2574ec6d909b940f66b4 Mon Sep 17 00:00:00 2001
From: Hassan Abedi <cogitator.tech@gmail.com>
Date: Thu, 21 May 2026 12:18:06 +0200
Subject: [PATCH] Add a summary note file for the notes taken so far

---
 flowlog/006-flowlog-synthesis.md | 426 +++++++++++++++++++++++++++++++
 1 file changed, 426 insertions(+)
 create mode 100644 flowlog/006-flowlog-synthesis.md

diff --git a/flowlog/006-flowlog-synthesis.md b/flowlog/006-flowlog-synthesis.md
new file mode 100644
index 0000000..42f4b28
--- /dev/null
+++ b/flowlog/006-flowlog-synthesis.md
@@ -0,0 +1,426 @@
+# FlowLog Synthesis
+
+A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.
+
+---
+
+## Short Answer
+
+The five FlowLog notes make one argument:
+
+```text
+FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.
+```
+
+The core architecture is:
+
+```text
+Datalog or Geolog-shaped rules
+-> dependency analysis and strata
+-> rule catalog
+-> join graph and relational plan
+-> FlowLog-style optimization
+-> DBSP or Differential Dataflow backend
+-> maintained outputs
+```
+
+FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.
+
+```mermaid
+flowchart LR
+    Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]
+    Strata --> Catalog["Rule Catalog"]
+    Catalog --> Plan["Relational Plan"]
+    Plan --> Optimize["FlowLog-Style Optimization"]
+    Optimize --> IR["Backend-Neutral IR"]
+    IR --> DBSP["DBSP Backend"]
+    IR --> DD["Differential Dataflow Backend"]
+    DBSP --> Outputs["Maintained Outputs"]
+    DD --> Outputs
+```
+
+---
+
+## How the Notes Fit Together
+
+The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
+while keeping Datalog-specific planning visible before lowering to dataflow operators.
+
+The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is:
+
+```text
+parsing
+-> strata
+-> catalog
+-> planning
+-> optimizing
+-> executing
+```
+
+The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
+inputs. FlowLog helps answer what relational plan should be maintained.
+
+The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
+join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.
+
+The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
+backend comparison, test corpus, data model decisions, and evaluation plan.
+
+Together, the notes move from:
+
+```text
+what FlowLog is
+-> how it is built
+-> why it matters for DBSP
+-> which technical pieces transfer
+-> how to use those pieces
+```
+
+---
+
+## Unified Mental Model
+
+The shared mental model is that Datalog execution has three separate layers.
+
+The source layer owns user-facing or system-facing rules:
+
+```text
+Datalog programs
+Geolog laws
+CRDT definitions
+```
+
+The planning layer owns the logical and physical shape of evaluation:
+
+```text
+strata
+rule catalogs
+join graphs
+antijoin placement
+SIP filters
+physical keys
+shared subplans
+```
+
+The backend layer owns maintained computation:
+
+```text
+DBSP circuits
+Differential Dataflow dataflows
+batch evaluators
+```
+
+The main design rule is:
+
+```text
+Backend execution should not rediscover rule semantics.
+```
+
+The backend should receive a checked, stratified, and optimized relational plan.
+
+---
+
+## FlowLog's Transferable Pieces
+
+The most transferable pieces are not tied to Differential Dataflow.
+
+**Rule Catalog**: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.
+
+**Stratification**: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.
+
+**Join Graph**: A graph or hypergraph of atoms connected by shared variables.
+
+**Structural Planning**: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.
+
+**Sideways Information Passing**: Semijoin-style filtering that uses known bindings to reduce later joins.
+
+**Antijoin Scheduling**: Placement of negated atoms as soon as their variables are bound.
+
+**Physical Key Choice**: Deliberate selection of keys and payload fields for maintained joins and arrangements.
+
+**Subplan Sharing**: Reuse of common antecedents or intermediate relations across rules.
+
+```mermaid
+flowchart TB
+    Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]
+    Catalog --> Negation["Negation and Filters"]
+    JoinGraph --> Structural["Structural Planning"]
+    Negation --> Antijoin["Antijoin Scheduling"]
+    Catalog --> SIP["Sideways Information Passing"]
+    Structural --> Keys["Physical Key Choice"]
+    SIP --> Keys
+    Antijoin --> Keys
+    Keys --> IR["Optimized Relational IR"]
+```
+
+---
+
+## DBSP Connection
+
+The DBSP notes focus on incremental maintenance:
+
+```text
+input deltas
+-> maintained operator state
+-> output deltas
+```
+
+DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation,
+Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.
+
+FlowLog's planning layer fits before DBSP:
+
+```text
+rules
+-> FlowLog-like planner
+-> DBSP circuit
+```
+
+This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained
+state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.
+
+---
+
+## CRDT Connection
+
+The CRDT notes use Datalog to define visible state over immutable operation facts.
+
+Simple register queries are already a good fit:
+
+```text
+set + pred
+-> overwritten
+-> visible values
+```
+
+The harder cases are recursive and structural:
+
+```text
+causal readiness
+list traversal
+tombstone skipping
+move-like tree operations
+```
+
+FlowLog helps by making the expensive parts explicit:
+
+- causal-readiness recursion should be planned around frontiers when possible
+- list traversal should avoid carrying unnecessary fields through every intermediate
+- antijoins for tombstones should run as soon as their keys are available
+- repeated list subqueries should share intermediate relations where possible
+
+The practical CRDT target is:
+
+```text
+same CRDT rules
+-> naive plan
+-> FlowLog-style plan
+-> DBSP-maintained result
+-> hydration and warm-update comparison
+```
+
+---
+
+## Geomerge Connection
+
+The Geomerge notes propose compiling supported laws into maintained violation relations.
+
+The simplest useful form is:
+
+```text
+required_consequent(x) :- antecedent(x).
+violation(x) :- required_consequent(x), not consequent(x).
+```
+
+FlowLog helps once antecedents become multi-atom joins:
+
+```text
+violation(x, y, z) :-
+    A(x, y),
+    B(y, z),
+    C(z),
+    not D(x, z).
+```
+
+At that point, a compiler needs the same machinery FlowLog demonstrates:
+
+- variable occurrence maps
+- join graph extraction
+- antijoin scheduling
+- projection minimization
+- shared antecedent detection
+- violation-row construction
+
+The practical Geomerge target is:
+
+```text
+FlatTheory laws
+-> supported relational subset
+-> rule catalog
+-> planned violation query
+-> DBSP-maintained violations relation
+```
+
+---
+
+## Recommended Architecture
+
+The recommended architecture has a backend-neutral middle layer.
+
+```mermaid
+flowchart TB
+    subgraph Sources["Source Layers"]
+        Datalog["Datalog CRDT Rules"]
+        Geolog["Compiled Geolog Laws"]
+    end
+
+    subgraph Planner["FlowLog-Inspired Planner"]
+        Parse["Parse or Translate"]
+        Strata["Stratify"]
+        Catalog["Catalog Rules"]
+        Graph["Join Graph Construction"]
+        Optimize["Plan Joins, Antijoins, and SIP"]
+        IR["Relational IR with Physical Keys"]
+    end
+
+    subgraph Backends["Execution Backends"]
+        DBSP["DBSP"]
+        DD["Differential Dataflow"]
+        Batch["Snapshot Evaluator"]
+    end
+
+    Datalog --> Parse
+    Geolog --> Parse
+    Parse --> Strata --> Catalog --> Graph --> Optimize --> IR
+    IR --> DBSP
+    IR --> DD
+    IR --> Batch
+```
+
+This architecture keeps the core questions separate:
+
+- source language
+- rule semantics
+- relational planning
+- physical execution backend
+- application integration
+
+That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the
+backend, or the storage boundary.
+
+---
+
+## Practical Path
+
+The practical path should be staged.
+
+**Stage 1: Planning-Only Prototype**
+
+```text
+Datalog-like rules
+-> dependency graph
+-> strata
+-> rule catalog
+-> join graph
+-> textual plan
+```
+
+This validates whether the compiler understands rule shape.
+
+**Stage 2: Narrow DBSP Lowering**
+
+```text
+planned rules
+-> projection, selection, join, antijoin, union, distinct, recursion
+-> DBSP circuit
+```
+
+This validates maintained outputs against a snapshot evaluator.
+
+**Stage 3: Workload Comparison**
+
+```text
+same rules
+same facts
+same outputs
+-> DBSP backend
+-> Differential Dataflow backend
+-> snapshot backend
+```
+
+This identifies whether bottlenecks come from planning or backend behavior.
+
+**Stage 4: Geomerge Integration**
+
+```text
+supported FlatTheory laws
+-> planned violation queries
+-> maintained violations relation
+-> agreement with current validator
+```
+
+This makes the DBSP checker a performance optimization first, not a semantic change.
+
+---
+
+## Test Workloads
+
+The shared test corpus should include:
+
+- transitive closure
+- reachability
+- connected components
+- antijoin checks
+- multi-value register
+- causal readiness
+- list next-element traversal
+- tombstone skipping
+- missing foreign-key violations
+- multi-atom Geomerge antecedents
+
+Each workload should have:
+
+- input schemas
+- base facts
+- update facts
+- expected snapshot output
+- expected output deltas
+- recursion and negation classification
+- accepted or rejected status
+
+---
+
+## Evaluation Questions
+
+The main evaluation questions are:
+
+- Does planning reduce hydration time?
+- Does planning reduce warm-update time?
+- Does causal-readiness update cost still grow with history depth?
+- Does antijoin scheduling reduce intermediate relation size?
+- Does SIP help frontier-shaped recursive queries?
+- Does physical key choice reduce maintained state?
+- Does the DBSP result match a snapshot evaluator?
+- Does Geomerge validation agree with the existing validator?
+- Is backend state rollback or preview execution tractable?
+
+---
+
+## Bottom Line
+
+The unified conclusion is:
+
+```text
+FlowLog is the planning blueprint.
+DBSP is the target incremental backend.
+CRDTs and Geomerge laws are the motivating rule sources.
+```
+
+The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement,
+and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.
+
+---
+
+## Changelog
+
+* **May 21, 2026** -- First version created to unify the first five FlowLog notes.