9.1 KiB
Using FlowLog Ideas
A practical note on how FlowLog or FlowLog-style planning could be used for the local DBSP, CRDT, and Geomerge work.
Short Answer
The most useful way to use FlowLog is as a planning reference, not as a direct dependency.
There are three possible levels of use:
Level 1: run FlowLog examples to learn workload behavior
Level 2: borrow FlowLog planning ideas for a DBSP frontend
Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs
The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer before DBSP.
Initial Non-Goals
The first step should not be replacing the DBSP backend with FlowLog.
That would conflate two separate questions:
- Which incremental backend should maintain deltas?
- Which frontend planner should produce the backend plan?
The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer.
The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.
Use Case 1: Better CRDT Query Planning
The CRDT queries in the DBSP notes include:
- multi-value register queries
- causal-readiness queries
- list traversal queries
- tombstone-skipping queries
The multi-value register is already simple enough:
set + pred -> overwritten -> visible values
The planning value is higher for causal readiness:
pred graph
-> roots
-> recursive reachability
-> ready operations
and list traversal:
insert tree
-> first child
-> next sibling
-> ancestor sibling
-> next element
-> next visible element
These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and adding semijoin filters around the current frontier.
The concrete experiment:
causal-readiness Datalog rules
rule catalog
naive plan
FlowLog-style planned version
DBSP hydration and warm-update comparison
The success criterion is not only lower total runtime. It is lower dependence on causal-history depth for small warm updates.
Use Case 2: Geomerge Violation Planning
The Geomerge integration note proposes compiling supported laws into violation relations.
For simple laws, this is direct:
missing_src(g, s) :-
edge(g, s, d),
not vertex(g, s).
For larger laws, the compiler needs a planner:
violation(vars) :-
antecedent_atom_1(...),
antecedent_atom_2(...),
antecedent_atom_3(...),
not consequent_atom(...).
FlowLog-style catalogs would help the compiler answer:
- which variables are introduced by each atom
- which atoms join on which variables
- when each negated consequent can be checked
- which projected values are needed for the violation row
- whether two laws share a common antecedent
The concrete experiment:
one Geomerge fixture theory
Datalog-like rule per supported law
join graph per rule
planned relational tree
comparison with the current direct validator's binding order
This can be useful before any DBSP integration exists, because it tests whether the compiler can understand the law shape.
Use Case 3: Backend Comparison
FlowLog can also be used as a comparison point for DBSP.
The fair comparison is not:
FlowLog product vs DBSP product
The useful comparison is:
same Datalog query
same input facts
same output relation
different backend lowering
Candidate workloads:
- transitive closure
- causal readiness
- list next-element traversal
- missing foreign-key violations
- multi-atom Geomerge antecedents
The comparison should measure:
- hydration time
- warm-update time
- memory use
- sensitivity to join order
- output delta size
- ease of rollback or preview execution
This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid batch-plus-incremental strategy is needed.
Use Case 4: Test Corpus for Datalog Lowering
FlowLog's examples suggest a useful test corpus shape.
A local Datalog-to-DBSP frontend should include small programs for:
- reachability
- transitive closure
- connected components
- antijoin checks
- aggregation checks
- CRDT multi-value register
- CRDT causal readiness
- CRDT list traversal
- Geomerge-style violation detection
Each test should define:
- input schemas
- input facts
- expected output facts
- expected output deltas for at least one update
- whether recursion or negation is used
- whether the program should be accepted or rejected
This gives a better foundation than testing only one CRDT or one Geomerge law.
First Prototype
The first useful prototype should be small.
A planning-only tool:
Datalog-like rule text
-> parsed rules
-> dependency graph
-> strata
-> rule catalog
-> join graph
-> planned relational tree
It does not need to run DBSP at first.
The output can be textual:
rule: missing_src
positive atoms: edge
negative atoms: vertex
join graph: none
plan:
scan edge
project (graph, src)
antijoin vertex on (graph, src)
emit violation row
For recursive rules, the output can identify the loop:
recursive stratum:
ready
base:
roots -> ready
step:
ready join pred on operation id
project successor operation id
This prototype would validate the compiler shape before depending on a backend API.
Second Prototype
The second prototype should lower a narrow subset to DBSP.
Supported subset:
- relation declarations
- positive atoms
- equality joins
- constants
- simple comparisons
- stratified negation
- union of repeated rule heads
- one recursive IDB at a time
Excluded subset:
- aggregation
- mutual recursion
- disjunction
- existential generation
- equality saturation
- custom scalar functions
The target workloads:
missing_srcandmissing_dst- multi-value register
- transitive closure
- causal readiness
This subset is enough to test the important bridge:
planned rules -> DBSP-maintained outputs
Data Model Decisions
Several decisions should be made explicitly before implementation.
Set or Multiset Semantics: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when distinct is applied.
Operation Identity: CRDT examples use (replica_id, counter). The planner should treat this pair either as two scalar fields or as one logical key with two physical fields.
Violation Rows: Geomerge violations should include enough context for error messages, not just a boolean.
Output Integration: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that integration.
Rollback: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay transactionally coupled to storage.
Evaluation Plan
The evaluation should separate correctness from performance.
Correctness checks:
planned evaluation == naive snapshot evaluation
DBSP maintained result == snapshot result
failed Geomerge transaction leaves no checker drift
Performance checks:
hydration time
warm-update time
memory used by maintained state
number of output delta rows
history-depth sensitivity
join-order sensitivity
The most important performance test is causal readiness:
large causal history
+ small new update
-> does update cost grow with history depth?
If the answer is yes, the frontend needs frontier-aware planning or a different physical representation.
Decision Points
The main decision points are:
- whether to implement a Datalog frontend or compile directly from Geolog laws
- whether the relational IR should be FlowLog-like, DBSP-like, or custom
- whether recursive planning should support mutual recursion early
- whether SIP should be automatic, directive-controlled, or both
- whether hydration should use the same backend as warm updates
- whether to persist backend operator state
- whether to compare against Differential Dataflow for recursive workloads
These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does not force Differential Dataflow as the backend.
Practical Recommendation
The first practical step is a planning-only FlowLog-inspired compiler layer.
The next step is lowering a small subset to DBSP.
After that, FlowLog itself can serve as a comparison backend for the same small programs.
The goal should be:
one rule frontend
one relational IR
two possible execution backends
That architecture would make it possible to test whether performance problems come from the query semantics, the planner, or the backend.
Changelog
- May 20, 2026 -- First version created from FlowLog, DBSP, CRDT, and Geomerge notes.