useful-notes/flowlog/006-flowlog-synthesis.md

# FlowLog Synthesis

A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.

---

## Short Answer

The five FlowLog notes make one argument:

```text
FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.
```

The core architecture is:

```text
Datalog or Geolog-shaped rules
-> dependency analysis and strata
-> rule catalog
-> join graph and relational plan
-> FlowLog-style optimization
-> DBSP or Differential Dataflow backend
-> maintained outputs
```

FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.

```mermaid
flowchart LR
    Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]
    Strata --> Catalog["Rule Catalog"]
    Catalog --> Plan["Relational Plan"]
    Plan --> Optimize["FlowLog-Style Optimization"]
    Optimize --> IR["Backend-Neutral IR"]
    IR --> DBSP["DBSP Backend"]
    IR --> DD["Differential Dataflow Backend"]
    DBSP --> Outputs["Maintained Outputs"]
    DD --> Outputs
```

---

## How the Notes Fit Together

The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
while keeping Datalog-specific planning visible before lowering to dataflow operators.

The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is:

```text
parsing
-> strata
-> catalog
-> planning
-> optimizing
-> executing
```

The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
inputs. FlowLog helps answer what relational plan should be maintained.

The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.

The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
backend comparison, test corpus, data model decisions, and evaluation plan.

Together, the notes move from:

```text
what FlowLog is
-> how it is built
-> why it matters for DBSP
-> which technical pieces transfer
-> how to use those pieces
```

---

## Unified Mental Model

The shared mental model is that Datalog execution has three separate layers.

The source layer owns user-facing or system-facing rules:

```text
Datalog programs
Geolog laws
CRDT definitions
```

The planning layer owns the logical and physical shape of evaluation:

```text
strata
rule catalogs
join graphs
antijoin placement
SIP filters
physical keys
shared subplans
```

The backend layer owns maintained computation:

```text
DBSP circuits
Differential Dataflow dataflows
batch evaluators
```

The main design rule is:

```text
Backend execution should not rediscover rule semantics.
```

The backend should receive a checked, stratified, and optimized relational plan.

---

## FlowLog's Transferable Pieces

The most transferable pieces are not tied to Differential Dataflow.

**Rule Catalog**: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.

**Stratification**: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.

**Join Graph**: A graph or hypergraph of atoms connected by shared variables.

**Structural Planning**: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.

**Sideways Information Passing**: Semijoin-style filtering that uses known bindings to reduce later joins.

**Antijoin Scheduling**: Placement of negated atoms as soon as their variables are bound.

**Physical Key Choice**: Deliberate selection of keys and payload fields for maintained joins and arrangements.

**Subplan Sharing**: Reuse of common antecedents or intermediate relations across rules.

```mermaid
flowchart TB
    Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]
    Catalog --> Negation["Negation and Filters"]
    JoinGraph --> Structural["Structural Planning"]
    Negation --> Antijoin["Antijoin Scheduling"]
    Catalog --> SIP["Sideways Information Passing"]
    Structural --> Keys["Physical Key Choice"]
    SIP --> Keys
    Antijoin --> Keys
    Keys --> IR["Optimized Relational IR"]
```

---

## DBSP Connection

The DBSP notes focus on incremental maintenance:

```text
input deltas
-> maintained operator state
-> output deltas
```

DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation,
Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.

FlowLog's planning layer fits before DBSP:

```text
rules
-> FlowLog-like planner
-> DBSP circuit
```

This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained
state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.

---

## CRDT Connection

The CRDT notes use Datalog to define visible state over immutable operation facts.

Simple register queries are already a good fit:

```text
set + pred
-> overwritten
-> visible values
```

The harder cases are recursive and structural:

```text
causal readiness
list traversal
tombstone skipping
move-like tree operations
```

FlowLog helps by making the expensive parts explicit:

- causal-readiness recursion should be planned around frontiers when possible
- list traversal should avoid carrying unnecessary fields through every intermediate
- antijoins for tombstones should run as soon as their keys are available
- repeated list subqueries should share intermediate relations where possible

The practical CRDT target is:

```text
same CRDT rules
-> naive plan
-> FlowLog-style plan
-> DBSP-maintained result
-> hydration and warm-update comparison
```

---

## Geomerge Connection

The Geomerge notes propose compiling supported laws into maintained violation relations.

The simplest useful form is:

```text
required_consequent(x) :- antecedent(x).
violation(x) :- required_consequent(x), not consequent(x).
```

FlowLog helps once antecedents become multi-atom joins:

```text
violation(x, y, z) :-
    A(x, y),
    B(y, z),
    C(z),
    not D(x, z).
```

At that point, a compiler needs the same machinery FlowLog demonstrates:

- variable occurrence maps
- join graph extraction
- antijoin scheduling
- projection minimization
- shared antecedent detection
- violation-row construction

The practical Geomerge target is:

```text
FlatTheory laws
-> supported relational subset
-> rule catalog
-> planned violation query
-> DBSP-maintained violations relation
```

---

## Recommended Architecture

The recommended architecture has a backend-neutral middle layer.

```mermaid
flowchart TB
    subgraph Sources["Source Layers"]
        Datalog["Datalog CRDT Rules"]
        Geolog["Compiled Geolog Laws"]
    end

    subgraph Planner["FlowLog-Inspired Planner"]
        Parse["Parse or Translate"]
        Strata["Stratify"]
        Catalog["Catalog Rules"]
        Graph["Join Graph Construction"]
        Optimize["Plan Joins, Antijoins, and SIP"]
        IR["Relational IR with Physical Keys"]
    end

    subgraph Backends["Execution Backends"]
        DBSP["DBSP"]
        DD["Differential Dataflow"]
        Batch["Snapshot Evaluator"]
    end

    Datalog --> Parse
    Geolog --> Parse
    Parse --> Strata --> Catalog --> Graph --> Optimize --> IR
    IR --> DBSP
    IR --> DD
    IR --> Batch
```

This architecture keeps the core questions separate:

- source language
- rule semantics
- relational planning
- physical execution backend
- application integration

That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the
backend, or the storage boundary.

---

## Practical Path

The practical path should be staged.

**Stage 1: Planning-Only Prototype**

```text
Datalog-like rules
-> dependency graph
-> strata
-> rule catalog
-> join graph
-> textual plan
```

This validates whether the compiler understands rule shape.

**Stage 2: Narrow DBSP Lowering**

```text
planned rules
-> projection, selection, join, antijoin, union, distinct, recursion
-> DBSP circuit
```

This validates maintained outputs against a snapshot evaluator.

**Stage 3: Workload Comparison**

```text
same rules
same facts
same outputs
-> DBSP backend
-> Differential Dataflow backend
-> snapshot backend
```

This identifies whether bottlenecks come from planning or backend behavior.

**Stage 4: Geomerge Integration**

```text
supported FlatTheory laws
-> planned violation queries
-> maintained violations relation
-> agreement with current validator
```

This makes the DBSP checker a performance optimization first, not a semantic change.

---

## Test Workloads

The shared test corpus should include:

- transitive closure
- reachability
- connected components
- antijoin checks
- multi-value register
- causal readiness
- list next-element traversal
- tombstone skipping
- missing foreign-key violations
- multi-atom Geomerge antecedents

Each workload should have:

- input schemas
- base facts
- update facts
- expected snapshot output
- expected output deltas
- recursion and negation classification
- accepted or rejected status

---

## Evaluation Questions

The main evaluation questions are:

- Does planning reduce hydration time?
- Does planning reduce warm-update time?
- Does causal-readiness update cost still grow with history depth?
- Does antijoin scheduling reduce intermediate relation size?
- Does SIP help frontier-shaped recursive queries?
- Does physical key choice reduce maintained state?
- Does the DBSP result match a snapshot evaluator?
- Does Geomerge validation agree with the existing validator?
- Is backend state rollback or preview execution tractable?

---

## Bottom Line

The unified conclusion is:

```text
FlowLog is the planning blueprint.
DBSP is the target incremental backend.
CRDTs and Geomerge laws are the motivating rule sources.
```

The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement,
and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.

---

## Changelog

* **May 21, 2026** -- First version created to unify the first five FlowLog notes.
Add a summary note file for the notes taken so far 2026-05-21 12:18:06 +02:00			`# FlowLog Synthesis`

			`A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.`

			`---`

			`## Short Answer`

			`The five FlowLog notes make one argument:`

			```text
			`FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.`
			```

			`The core architecture is:`

			```text
			`Datalog or Geolog-shaped rules`
			`-> dependency analysis and strata`
			`-> rule catalog`
			`-> join graph and relational plan`
			`-> FlowLog-style optimization`
			`-> DBSP or Differential Dataflow backend`
			`-> maintained outputs`
			```

			`FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.`

			```mermaid
			`flowchart LR`
			`Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]`
			`Strata --> Catalog["Rule Catalog"]`
			`Catalog --> Plan["Relational Plan"]`
			`Plan --> Optimize["FlowLog-Style Optimization"]`
			`Optimize --> IR["Backend-Neutral IR"]`
			`IR --> DBSP["DBSP Backend"]`
			`IR --> DD["Differential Dataflow Backend"]`
			`DBSP --> Outputs["Maintained Outputs"]`
			`DD --> Outputs`
			```

			`---`

			`## How the Notes Fit Together`

			The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
			`while keeping Datalog-specific planning visible before lowering to dataflow operators.`

			The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is:

			```text
			`parsing`
			`-> strata`
			`-> catalog`
			`-> planning`
			`-> optimizing`
			`-> executing`
			```

			The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
			`inputs. FlowLog helps answer what relational plan should be maintained.`

			The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
			`join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.`

			The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
			`backend comparison, test corpus, data model decisions, and evaluation plan.`

			`Together, the notes move from:`

			```text
			`what FlowLog is`
			`-> how it is built`
			`-> why it matters for DBSP`
			`-> which technical pieces transfer`
			`-> how to use those pieces`
			```

			`---`

			`## Unified Mental Model`

			`The shared mental model is that Datalog execution has three separate layers.`

			`The source layer owns user-facing or system-facing rules:`

			```text
			`Datalog programs`
			`Geolog laws`
			`CRDT definitions`
			```

			`The planning layer owns the logical and physical shape of evaluation:`

			```text
			`strata`
			`rule catalogs`
			`join graphs`
			`antijoin placement`
			`SIP filters`
			`physical keys`
			`shared subplans`
			```

			`The backend layer owns maintained computation:`

			```text
			`DBSP circuits`
			`Differential Dataflow dataflows`
			`batch evaluators`
			```

			`The main design rule is:`

			```text
			`Backend execution should not rediscover rule semantics.`
			```

			`The backend should receive a checked, stratified, and optimized relational plan.`

			`---`

			`## FlowLog's Transferable Pieces`

			`The most transferable pieces are not tied to Differential Dataflow.`

			`Rule Catalog: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.`

			`Stratification: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.`

			`Join Graph: A graph or hypergraph of atoms connected by shared variables.`

			`Structural Planning: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.`

			`Sideways Information Passing: Semijoin-style filtering that uses known bindings to reduce later joins.`

			`Antijoin Scheduling: Placement of negated atoms as soon as their variables are bound.`

			`Physical Key Choice: Deliberate selection of keys and payload fields for maintained joins and arrangements.`

			`Subplan Sharing: Reuse of common antecedents or intermediate relations across rules.`

			```mermaid
			`flowchart TB`
			`Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]`
			`Catalog --> Negation["Negation and Filters"]`
			`JoinGraph --> Structural["Structural Planning"]`
			`Negation --> Antijoin["Antijoin Scheduling"]`
			`Catalog --> SIP["Sideways Information Passing"]`
			`Structural --> Keys["Physical Key Choice"]`
			`SIP --> Keys`
			`Antijoin --> Keys`
			`Keys --> IR["Optimized Relational IR"]`
			```

			`---`

			`## DBSP Connection`

			`The DBSP notes focus on incremental maintenance:`

			```text
			`input deltas`
			`-> maintained operator state`
			`-> output deltas`
			```

			`DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation,`
			`Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.`

			`FlowLog's planning layer fits before DBSP:`

			```text
			`rules`
			`-> FlowLog-like planner`
			`-> DBSP circuit`
			```

			`This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained`
			`state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.`

			`---`

			`## CRDT Connection`

			`The CRDT notes use Datalog to define visible state over immutable operation facts.`

			`Simple register queries are already a good fit:`

			```text
			`set + pred`
			`-> overwritten`
			`-> visible values`
			```

			`The harder cases are recursive and structural:`

			```text
			`causal readiness`
			`list traversal`
			`tombstone skipping`
			`move-like tree operations`
			```

			`FlowLog helps by making the expensive parts explicit:`

			`- causal-readiness recursion should be planned around frontiers when possible`
			`- list traversal should avoid carrying unnecessary fields through every intermediate`
			`- antijoins for tombstones should run as soon as their keys are available`
			`- repeated list subqueries should share intermediate relations where possible`

			`The practical CRDT target is:`

			```text
			`same CRDT rules`
			`-> naive plan`
			`-> FlowLog-style plan`
			`-> DBSP-maintained result`
			`-> hydration and warm-update comparison`
			```

			`---`

			`## Geomerge Connection`

			`The Geomerge notes propose compiling supported laws into maintained violation relations.`

			`The simplest useful form is:`

			```text
			`required_consequent(x) :- antecedent(x).`
			`violation(x) :- required_consequent(x), not consequent(x).`
			```

			`FlowLog helps once antecedents become multi-atom joins:`

			```text
			`violation(x, y, z) :-`
			`A(x, y),`
			`B(y, z),`
			`C(z),`
			`not D(x, z).`
			```

			`At that point, a compiler needs the same machinery FlowLog demonstrates:`

			`- variable occurrence maps`
			`- join graph extraction`
			`- antijoin scheduling`
			`- projection minimization`
			`- shared antecedent detection`
			`- violation-row construction`

			`The practical Geomerge target is:`

			```text
			`FlatTheory laws`
			`-> supported relational subset`
			`-> rule catalog`
			`-> planned violation query`
			`-> DBSP-maintained violations relation`
			```

			`---`

			`## Recommended Architecture`

			`The recommended architecture has a backend-neutral middle layer.`

			```mermaid
			`flowchart TB`
			`subgraph Sources["Source Layers"]`
			`Datalog["Datalog CRDT Rules"]`
			`Geolog["Compiled Geolog Laws"]`
			`end`

			`subgraph Planner["FlowLog-Inspired Planner"]`
			`Parse["Parse or Translate"]`
			`Strata["Stratify"]`
			`Catalog["Catalog Rules"]`
			`Graph["Join Graph Construction"]`
			`Optimize["Plan Joins, Antijoins, and SIP"]`
			`IR["Relational IR with Physical Keys"]`
			`end`

			`subgraph Backends["Execution Backends"]`
			`DBSP["DBSP"]`
			`DD["Differential Dataflow"]`
			`Batch["Snapshot Evaluator"]`
			`end`

			`Datalog --> Parse`
			`Geolog --> Parse`
			`Parse --> Strata --> Catalog --> Graph --> Optimize --> IR`
			`IR --> DBSP`
			`IR --> DD`
			`IR --> Batch`
			```

			`This architecture keeps the core questions separate:`

			`- source language`
			`- rule semantics`
			`- relational planning`
			`- physical execution backend`
			`- application integration`

			`That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the`
			`backend, or the storage boundary.`

			`---`

			`## Practical Path`

			`The practical path should be staged.`

			`Stage 1: Planning-Only Prototype`

			```text
			`Datalog-like rules`
			`-> dependency graph`
			`-> strata`
			`-> rule catalog`
			`-> join graph`
			`-> textual plan`
			```

			`This validates whether the compiler understands rule shape.`

			`Stage 2: Narrow DBSP Lowering`

			```text
			`planned rules`
			`-> projection, selection, join, antijoin, union, distinct, recursion`
			`-> DBSP circuit`
			```

			`This validates maintained outputs against a snapshot evaluator.`

			`Stage 3: Workload Comparison`

			```text
			`same rules`
			`same facts`
			`same outputs`
			`-> DBSP backend`
			`-> Differential Dataflow backend`
			`-> snapshot backend`
			```

			`This identifies whether bottlenecks come from planning or backend behavior.`

			`Stage 4: Geomerge Integration`

			```text
			`supported FlatTheory laws`
			`-> planned violation queries`
			`-> maintained violations relation`
			`-> agreement with current validator`
			```

			`This makes the DBSP checker a performance optimization first, not a semantic change.`

			`---`

			`## Test Workloads`

			`The shared test corpus should include:`

			`- transitive closure`
			`- reachability`
			`- connected components`
			`- antijoin checks`
			`- multi-value register`
			`- causal readiness`
			`- list next-element traversal`
			`- tombstone skipping`
			`- missing foreign-key violations`
			`- multi-atom Geomerge antecedents`

			`Each workload should have:`

			`- input schemas`
			`- base facts`
			`- update facts`
			`- expected snapshot output`
			`- expected output deltas`
			`- recursion and negation classification`
			`- accepted or rejected status`

			`---`

			`## Evaluation Questions`

			`The main evaluation questions are:`

			`- Does planning reduce hydration time?`
			`- Does planning reduce warm-update time?`
			`- Does causal-readiness update cost still grow with history depth?`
			`- Does antijoin scheduling reduce intermediate relation size?`
			`- Does SIP help frontier-shaped recursive queries?`
			`- Does physical key choice reduce maintained state?`
			`- Does the DBSP result match a snapshot evaluator?`
			`- Does Geomerge validation agree with the existing validator?`
			`- Is backend state rollback or preview execution tractable?`

			`---`

			`## Bottom Line`

			`The unified conclusion is:`

			```text
			`FlowLog is the planning blueprint.`
			`DBSP is the target incremental backend.`
			`CRDTs and Geomerge laws are the motivating rule sources.`
			```

			`The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement,`
			`and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.`

			`---`

			`## Changelog`

			`* May 21, 2026 -- First version created to unify the first five FlowLog notes.`