useful-notes/flowlog/003-flowlog-and-dbsp-synergy.md

# FlowLog and DBSP Synergy

A note on how FlowLog's Datalog planning ideas could support the DBSP, CRDT, and Geomerge work.

---

## Short Answer

FlowLog and the DBSP notes meet at the boundary between Datalog rules and incremental execution.

DBSP answers:

```text
How can a relational query be maintained over changing inputs?
```

FlowLog helps answer:

```text
What relational plan should the incremental backend maintain?
```

The synergy is not that FlowLog should replace DBSP. The useful split is:

```text
FlowLog-like frontend and optimizer
-> backend-neutral relational IR
-> DBSP-maintained circuit
```

FlowLog is a useful blueprint for the compiler and optimizer that should sit in front of DBSP.

---

## Existing DBSP Direction

The DBSP notes are organized around three related goals.

First, CRDTs can be described as deterministic Datalog queries over immutable operation facts:

```text
operation facts
-> Datalog rules
-> visible CRDT state
```

Second, DBSP can maintain those query results incrementally:

```text
input relation deltas
-> DBSP circuit step
-> output relation deltas
```

Third, Geomerge laws can be compiled into maintained violation relations:

```text
compiled relational laws
-> violation queries
-> maintained violation deltas
```

The missing layer is query planning. A Datalog rule can be semantically correct but still produce a poor physical plan.

---

## FlowLog's Useful Layer

FlowLog has a pipeline that is useful independently of its Differential Dataflow backend:

```text
Datalog program
-> parser
-> strata
-> rule catalog
-> per-rule relational plan
-> optimizer
-> incremental dataflow backend
```

The reusable parts are:

- dependency analysis
- stratification
- rule catalog construction
- join graph extraction
- structural join planning
- sideways information passing
- physical key and value shape selection

These are exactly the parts a DBSP-backed Datalog or Geolog compiler needs before lowering rules to DBSP.

---

## CRDT Synergy

The CRDT notes identify three representative query shapes.

The multi-value register is mostly projection plus antijoin:

```text
overwritten(RepId, Ctr) :-
    pred(RepId, Ctr, _, _).

mvrStore(Key, Value) :-
    set(RepId, Ctr, Key, Value),
    not overwritten(RepId, Ctr).
```

This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation history.

The causal-readiness query is harder:

```text
isCausallyReady(RepId, Ctr) :-
    isRoot(RepId, Ctr).

isCausallyReady(RepId, Ctr) :-
    isCausallyReady(FromRepId, FromCtr),
    pred(FromRepId, FromCtr, RepId, Ctr).
```

This is recursive graph traversal. The DBSP CRDT notes report that this query can remain dependent on causal-history depth, even during warm updates.

FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings. For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots through the whole causal graph.

The list CRDT query is also planning-sensitive. Relations such as `firstChild`, `nextSibling`, `nextSiblingAnc`, `nextElem`, and `nextVisible` create several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP builds maintained operator state.

---

## Geomerge Synergy

The Geomerge integration note proposes maintaining one combined violation relation.

Simple foreign-key laws are straightforward:

```text
required_src(graph, src) :- G.E(graph, src, dst).
missing_src(graph, src) :- required_src(graph, src), not G.V(graph, src).
```

For this subset, DBSP can maintain projections and antijoins directly.

The need for FlowLog grows when laws contain several atoms:

```text
violation(x, y, z) :-
    A(x, y),
    B(y, z),
    C(z),
    not D(x, z).
```

At that point, the compiler must decide:

- which atoms join first
- which variables form keys
- where filters and antijoins should be applied
- which intermediate fields must be retained
- whether several laws share subplans

FlowLog's catalog and structural planning model is a good guide for this compiler layer.

The resulting Geomerge architecture could be:

```text
FlatTheory laws
-> supported Datalog-shaped rules
-> FlowLog-like catalog and optimizer
-> relational violation plan
-> DBSP circuit
-> maintained violations relation
```

This keeps DBSP as a performance backend while giving Geomerge a real planning layer.

---

## IR Boundary

The strongest architectural lesson is to avoid binding the system too tightly to either source syntax or backend syntax.

The durable boundary should be a relational intermediate representation:

```text
source rules
-> rule catalog
-> relational IR
-> backend-specific lowering
```

For CRDTs, the source may be a Datalog dialect.

For Geomerge, the source may be compiled Geolog laws.

For execution, the backend may be DBSP, Differential Dataflow, or a non-incremental batch engine.

This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been checked, stratified, and optimized.

---

## Optimization Transfer

FlowLog suggests several optimizations that transfer well to DBSP-backed work.

**Structural Planning**: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates are weak.

**Sideways Information Passing**: Add semijoin-style filters so later joins and recursive steps see fewer irrelevant tuples.

**Antijoin Scheduling**: Apply negated atoms as soon as their variables are available. This matches the DBSP CRDT note's antijoin-pushdown agenda.

**Subplan Sharing**: Reuse common derived relations across laws or CRDT views when multiple outputs need the same intermediate facts.

**Physical Key Choice**: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and arrangement choices will become runtime costs.

---

## Backend Comparison

FlowLog uses Differential Dataflow. The DBSP notes use DBSP.

The models differ:

- Differential Dataflow uses collections with logical time and differences.
- DBSP uses streams, Z-sets, integration, differentiation, and circuits.

The shared lesson is more important than the difference:

```text
incremental backends maintain operator state
```

That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase memory use and update cost for the lifetime of the maintained query.

FlowLog is useful because it treats planning as a first-class layer before execution.

---

## Proposed Synergy Path

The practical path is incremental.

First, use FlowLog as a reading reference for a DBSP frontend:

```text
parser
-> dependency graph
-> strata
-> rule catalog
-> relational plan
```

Second, add a small optimizer:

```text
join ordering
antijoin pushdown
simple SIP for bound variables
```

Third, lower the optimized plan to DBSP operators:

```text
projection
selection
join
antijoin
union
distinct
fixed point
```

Fourth, test against the current direct implementations:

```text
snapshot result == DBSP-maintained result
```

For Geomerge, the first target should stay the same as the DBSP integration note: supported laws compiled into one maintained `violations` relation.

For CRDTs, the first target should be causal readiness and list traversal, since those are where the existing DBSP notes identify performance risk.

---

## Open Questions

- Can FlowLog-style SIP be adapted to causal-readiness frontiers?
- Can DBSP expose enough physical planning control for key choice and subplan sharing?
- Should the Datalog frontend target a FlowLog-like IR before DBSP lowering?
- Can Geomerge laws use the same catalog structure as Datalog rules?
- Which recursive CRDT queries benefit from structural planning?
- Is hydration better handled by DBSP, a batch engine, or persisted operator state?
- Can one optimizer target both DBSP and Differential Dataflow backends?

---

## Bottom Line

FlowLog should be treated as an optimizer and compiler blueprint for the DBSP work.

The DBSP notes already have the right execution target:

```text
maintained relational deltas
```

FlowLog adds the missing planning discipline:

```text
rule catalog
+ join graph
+ recursive strata
+ robust plan choice
+ SIP-style prefiltering
```

Together, the systems suggest a stronger architecture:

```text
Datalog or Geolog rules
-> FlowLog-like planning layer
-> DBSP incremental backend
-> maintained CRDT views or violation relations
```

---

## Changelog

* **May 20, 2026** -- First version created from the DBSP and FlowLog notes.