8.7 KiB
FlowLog and DBSP Synergy
A note on how FlowLog's Datalog planning ideas could support the DBSP, CRDT, and Geomerge work.
Short Answer
FlowLog and the DBSP notes meet at the boundary between Datalog rules and incremental execution.
DBSP answers:
How can a relational query be maintained over changing inputs?
FlowLog helps answer:
What relational plan should the incremental backend maintain?
The synergy is not that FlowLog should replace DBSP. The useful split is:
FlowLog-like frontend and optimizer
-> backend-neutral relational IR
-> DBSP-maintained circuit
FlowLog is a useful blueprint for the compiler and optimizer that should sit in front of DBSP.
Existing DBSP Direction
The DBSP notes are organized around three related goals.
First, CRDTs can be described as deterministic Datalog queries over immutable operation facts:
operation facts
-> Datalog rules
-> visible CRDT state
Second, DBSP can maintain those query results incrementally:
input relation deltas
-> DBSP circuit step
-> output relation deltas
Third, Geomerge laws can be compiled into maintained violation relations:
compiled relational laws
-> violation queries
-> maintained violation deltas
The missing layer is query planning. A Datalog rule can be semantically correct but still produce a poor physical plan.
FlowLog's Useful Layer
FlowLog has a pipeline that is useful independently of its Differential Dataflow backend:
Datalog program
-> parser
-> strata
-> rule catalog
-> per-rule relational plan
-> optimizer
-> incremental dataflow backend
The reusable parts are:
- dependency analysis
- stratification
- rule catalog construction
- join graph extraction
- structural join planning
- sideways information passing
- physical key and value shape selection
These are exactly the parts a DBSP-backed Datalog or Geolog compiler needs before lowering rules to DBSP.
CRDT Synergy
The CRDT notes identify three representative query shapes.
The multi-value register is mostly projection plus antijoin:
overwritten(RepId, Ctr) :-
pred(RepId, Ctr, _, _).
mvrStore(Key, Value) :-
set(RepId, Ctr, Key, Value),
not overwritten(RepId, Ctr).
This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation history.
The causal-readiness query is harder:
isCausallyReady(RepId, Ctr) :-
isRoot(RepId, Ctr).
isCausallyReady(RepId, Ctr) :-
isCausallyReady(FromRepId, FromCtr),
pred(FromRepId, FromCtr, RepId, Ctr).
This is recursive graph traversal. The DBSP CRDT notes report that this query can remain dependent on causal-history depth, even during warm updates.
FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings. For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots through the whole causal graph.
The list CRDT query is also planning-sensitive. Relations such as firstChild, nextSibling, nextSiblingAnc, nextElem, and nextVisible create several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP builds maintained operator state.
Geomerge Synergy
The Geomerge integration note proposes maintaining one combined violation relation.
Simple foreign-key laws are straightforward:
required_src(graph, src) :- G.E(graph, src, dst).
missing_src(graph, src) :- required_src(graph, src), not G.V(graph, src).
For this subset, DBSP can maintain projections and antijoins directly.
The need for FlowLog grows when laws contain several atoms:
violation(x, y, z) :-
A(x, y),
B(y, z),
C(z),
not D(x, z).
At that point, the compiler must decide:
- which atoms join first
- which variables form keys
- where filters and antijoins should be applied
- which intermediate fields must be retained
- whether several laws share subplans
FlowLog's catalog and structural planning model is a good guide for this compiler layer.
The resulting Geomerge architecture could be:
FlatTheory laws
-> supported Datalog-shaped rules
-> FlowLog-like catalog and optimizer
-> relational violation plan
-> DBSP circuit
-> maintained violations relation
This keeps DBSP as a performance backend while giving Geomerge a real planning layer.
IR Boundary
The strongest architectural lesson is to avoid binding the system too tightly to either source syntax or backend syntax.
The durable boundary should be a relational intermediate representation:
source rules
-> rule catalog
-> relational IR
-> backend-specific lowering
For CRDTs, the source may be a Datalog dialect.
For Geomerge, the source may be compiled Geolog laws.
For execution, the backend may be DBSP, Differential Dataflow, or a non-incremental batch engine.
This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been checked, stratified, and optimized.
Optimization Transfer
FlowLog suggests several optimizations that transfer well to DBSP-backed work.
Structural Planning: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates are weak.
Sideways Information Passing: Add semijoin-style filters so later joins and recursive steps see fewer irrelevant tuples.
Antijoin Scheduling: Apply negated atoms as soon as their variables are available. This matches the DBSP CRDT note's antijoin-pushdown agenda.
Subplan Sharing: Reuse common derived relations across laws or CRDT views when multiple outputs need the same intermediate facts.
Physical Key Choice: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and arrangement choices will become runtime costs.
Backend Comparison
FlowLog uses Differential Dataflow. The DBSP notes use DBSP.
The models differ:
- Differential Dataflow uses collections with logical time and differences.
- DBSP uses streams, Z-sets, integration, differentiation, and circuits.
The shared lesson is more important than the difference:
incremental backends maintain operator state
That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase memory use and update cost for the lifetime of the maintained query.
FlowLog is useful because it treats planning as a first-class layer before execution.
Proposed Synergy Path
The practical path is incremental.
First, use FlowLog as a reading reference for a DBSP frontend:
parser
-> dependency graph
-> strata
-> rule catalog
-> relational plan
Second, add a small optimizer:
join ordering
antijoin pushdown
simple SIP for bound variables
Third, lower the optimized plan to DBSP operators:
projection
selection
join
antijoin
union
distinct
fixed point
Fourth, test against the current direct implementations:
snapshot result == DBSP-maintained result
For Geomerge, the first target should stay the same as the DBSP integration note: supported laws compiled into one maintained violations relation.
For CRDTs, the first target should be causal readiness and list traversal, since those are where the existing DBSP notes identify performance risk.
Open Questions
- Can FlowLog-style SIP be adapted to causal-readiness frontiers?
- Can DBSP expose enough physical planning control for key choice and subplan sharing?
- Should the Datalog frontend target a FlowLog-like IR before DBSP lowering?
- Can Geomerge laws use the same catalog structure as Datalog rules?
- Which recursive CRDT queries benefit from structural planning?
- Is hydration better handled by DBSP, a batch engine, or persisted operator state?
- Can one optimizer target both DBSP and Differential Dataflow backends?
Bottom Line
FlowLog should be treated as an optimizer and compiler blueprint for the DBSP work.
The DBSP notes already have the right execution target:
maintained relational deltas
FlowLog adds the missing planning discipline:
rule catalog
+ join graph
+ recursive strata
+ robust plan choice
+ SIP-style prefiltering
Together, the systems suggest a stronger architecture:
Datalog or Geolog rules
-> FlowLog-like planning layer
-> DBSP incremental backend
-> maintained CRDT views or violation relations
Changelog
- May 20, 2026 -- First version created from the DBSP and FlowLog notes.