# FlowLog Synthesis

A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.

---

## Short Answer

The five FlowLog notes make one argument:

```text
FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.
```

The core architecture is:

```text
Datalog or Geolog-shaped rules
-> dependency analysis and strata
-> rule catalog
-> join graph and relational plan
-> FlowLog-style optimization
-> DBSP or Differential Dataflow backend
-> maintained outputs
```

FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.

```mermaid
flowchart LR
    Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]
    Strata --> Catalog["Rule Catalog"]
    Catalog --> Plan["Relational Plan"]
    Plan --> Optimize["FlowLog-Style Optimization"]
    Optimize --> IR["Backend-Neutral IR"]
    IR --> DBSP["DBSP Backend"]
    IR --> DD["Differential Dataflow Backend"]
    DBSP --> Outputs["Maintained Outputs"]
    DD --> Outputs
```

---

## How the Notes Fit Together

The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
while keeping Datalog-specific planning visible before lowering to dataflow operators.

The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is:

```text
parsing
-> strata
-> catalog
-> planning
-> optimizing
-> executing
```

The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
inputs. FlowLog helps answer what relational plan should be maintained.

The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.

The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
backend comparison, test corpus, data model decisions, and evaluation plan.

Together, the notes move from:

```text
what FlowLog is
-> how it is built
-> why it matters for DBSP
-> which technical pieces transfer
-> how to use those pieces
```

---

## Unified Mental Model

The shared mental model is that Datalog execution has three separate layers.

The source layer owns user-facing or system-facing rules:

```text
Datalog programs
Geolog laws
CRDT definitions
```

The planning layer owns the logical and physical shape of evaluation:

```text
strata
rule catalogs
join graphs
antijoin placement
SIP filters
physical keys
shared subplans
```

The backend layer owns maintained computation:

```text
DBSP circuits
Differential Dataflow dataflows
batch evaluators
```

The main design rule is:

```text
Backend execution should not rediscover rule semantics.
```

The backend should receive a checked, stratified, and optimized relational plan.

---

## FlowLog's Transferable Pieces

The most transferable pieces are not tied to Differential Dataflow.

**Rule Catalog**: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.

**Stratification**: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.

**Join Graph**: A graph or hypergraph of atoms connected by shared variables.

**Structural Planning**: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.

**Sideways Information Passing**: Semijoin-style filtering that uses known bindings to reduce later joins.

**Antijoin Scheduling**: Placement of negated atoms as soon as their variables are bound.

**Physical Key Choice**: Deliberate selection of keys and payload fields for maintained joins and arrangements.

**Subplan Sharing**: Reuse of common antecedents or intermediate relations across rules.

```mermaid
flowchart TB
    Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]
    Catalog --> Negation["Negation and Filters"]
    JoinGraph --> Structural["Structural Planning"]
    Negation --> Antijoin["Antijoin Scheduling"]
    Catalog --> SIP["Sideways Information Passing"]
    Structural --> Keys["Physical Key Choice"]
    SIP --> Keys
    Antijoin --> Keys
    Keys --> IR["Optimized Relational IR"]
```

---

## DBSP Connection

The DBSP notes focus on incremental maintenance:

```text
input deltas
-> maintained operator state
-> output deltas
```

DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation,
Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.

FlowLog's planning layer fits before DBSP:

```text
rules
-> FlowLog-like planner
-> DBSP circuit
```

This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained
state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.

---

## CRDT Connection

The CRDT notes use Datalog to define visible state over immutable operation facts.

Simple register queries are already a good fit:

```text
set + pred
-> overwritten
-> visible values
```

The harder cases are recursive and structural:

```text
causal readiness
list traversal
tombstone skipping
move-like tree operations
```

FlowLog helps by making the expensive parts explicit:

- causal-readiness recursion should be planned around frontiers when possible
- list traversal should avoid carrying unnecessary fields through every intermediate
- antijoins for tombstones should run as soon as their keys are available
- repeated list subqueries should share intermediate relations where possible

The practical CRDT target is:

```text
same CRDT rules
-> naive plan
-> FlowLog-style plan
-> DBSP-maintained result
-> hydration and warm-update comparison
```

---

## Geomerge Connection

The Geomerge notes propose compiling supported laws into maintained violation relations.

The simplest useful form is:

```text
required_consequent(x) :- antecedent(x).
violation(x) :- required_consequent(x), not consequent(x).
```

FlowLog helps once antecedents become multi-atom joins:

```text
violation(x, y, z) :-
    A(x, y),
    B(y, z),
    C(z),
    not D(x, z).
```

At that point, a compiler needs the same machinery FlowLog demonstrates:

- variable occurrence maps
- join graph extraction
- antijoin scheduling
- projection minimization
- shared antecedent detection
- violation-row construction

The practical Geomerge target is:

```text
FlatTheory laws
-> supported relational subset
-> rule catalog
-> planned violation query
-> DBSP-maintained violations relation
```

---

## Recommended Architecture

The recommended architecture has a backend-neutral middle layer.

```mermaid
flowchart TB
    subgraph Sources["Source Layers"]
        Datalog["Datalog CRDT Rules"]
        Geolog["Compiled Geolog Laws"]
    end

    subgraph Planner["FlowLog-Inspired Planner"]
        Parse["Parse or Translate"]
        Strata["Stratify"]
        Catalog["Catalog Rules"]
        Graph["Join Graph Construction"]
        Optimize["Plan Joins, Antijoins, and SIP"]
        IR["Relational IR with Physical Keys"]
    end

    subgraph Backends["Execution Backends"]
        DBSP["DBSP"]
        DD["Differential Dataflow"]
        Batch["Snapshot Evaluator"]
    end

    Datalog --> Parse
    Geolog --> Parse
    Parse --> Strata --> Catalog --> Graph --> Optimize --> IR
    IR --> DBSP
    IR --> DD
    IR --> Batch
```

This architecture keeps the core questions separate:

- source language
- rule semantics
- relational planning
- physical execution backend
- application integration

That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the
backend, or the storage boundary.

---

## Practical Path

The practical path should be staged.

**Stage 1: Planning-Only Prototype**

```text
Datalog-like rules
-> dependency graph
-> strata
-> rule catalog
-> join graph
-> textual plan
```

This validates whether the compiler understands rule shape.

**Stage 2: Narrow DBSP Lowering**

```text
planned rules
-> projection, selection, join, antijoin, union, distinct, recursion
-> DBSP circuit
```

This validates maintained outputs against a snapshot evaluator.

**Stage 3: Workload Comparison**

```text
same rules
same facts
same outputs
-> DBSP backend
-> Differential Dataflow backend
-> snapshot backend
```

This identifies whether bottlenecks come from planning or backend behavior.

**Stage 4: Geomerge Integration**

```text
supported FlatTheory laws
-> planned violation queries
-> maintained violations relation
-> agreement with current validator
```

This makes the DBSP checker a performance optimization first, not a semantic change.

---

## Test Workloads

The shared test corpus should include:

- transitive closure
- reachability
- connected components
- antijoin checks
- multi-value register
- causal readiness
- list next-element traversal
- tombstone skipping
- missing foreign-key violations
- multi-atom Geomerge antecedents

Each workload should have:

- input schemas
- base facts
- update facts
- expected snapshot output
- expected output deltas
- recursion and negation classification
- accepted or rejected status

---

## Evaluation Questions

The main evaluation questions are:

- Does planning reduce hydration time?
- Does planning reduce warm-update time?
- Does causal-readiness update cost still grow with history depth?
- Does antijoin scheduling reduce intermediate relation size?
- Does SIP help frontier-shaped recursive queries?
- Does physical key choice reduce maintained state?
- Does the DBSP result match a snapshot evaluator?
- Does Geomerge validation agree with the existing validator?
- Is backend state rollback or preview execution tractable?

---

## Bottom Line

The unified conclusion is:

```text
FlowLog is the planning blueprint.
DBSP is the target incremental backend.
CRDTs and Geomerge laws are the motivating rule sources.
```

The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement,
and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.

---

## Changelog

* **May 21, 2026** -- First version created to unify the first five FlowLog notes.