11 KiB
FlowLog Synthesis
A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.
Short Answer
The five FlowLog notes make one argument:
FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.
The core architecture is:
Datalog or Geolog-shaped rules
-> dependency analysis and strata
-> rule catalog
-> join graph and relational plan
-> FlowLog-style optimization
-> DBSP or Differential Dataflow backend
-> maintained outputs
FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.
flowchart LR
Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]
Strata --> Catalog["Rule Catalog"]
Catalog --> Plan["Relational Plan"]
Plan --> Optimize["FlowLog-Style Optimization"]
Optimize --> IR["Backend-Neutral IR"]
IR --> DBSP["DBSP Backend"]
IR --> DD["Differential Dataflow Backend"]
DBSP --> Outputs["Maintained Outputs"]
DD --> Outputs
How the Notes Fit Together
The first note, 001-flowlog-primer.md, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
while keeping Datalog-specific planning visible before lowering to dataflow operators.
The second note, 002-flowlog-implementation.md, explains the artifact structure. The useful implementation shape is:
parsing
-> strata
-> catalog
-> planning
-> optimizing
-> executing
The third note, 003-flowlog-and-dbsp-synergy.md, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
inputs. FlowLog helps answer what relational plan should be maintained.
The fourth note, 004-flowlog-technical-planning-notes.md, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.
The fifth note, 005-using-flowlog-ideas.md, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
backend comparison, test corpus, data model decisions, and evaluation plan.
Together, the notes move from:
what FlowLog is
-> how it is built
-> why it matters for DBSP
-> which technical pieces transfer
-> how to use those pieces
Unified Mental Model
The shared mental model is that Datalog execution has three separate layers.
The source layer owns user-facing or system-facing rules:
Datalog programs
Geolog laws
CRDT definitions
The planning layer owns the logical and physical shape of evaluation:
strata
rule catalogs
join graphs
antijoin placement
SIP filters
physical keys
shared subplans
The backend layer owns maintained computation:
DBSP circuits
Differential Dataflow dataflows
batch evaluators
The main design rule is:
Backend execution should not rediscover rule semantics.
The backend should receive a checked, stratified, and optimized relational plan.
FlowLog's Transferable Pieces
The most transferable pieces are not tied to Differential Dataflow.
Rule Catalog: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.
Stratification: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.
Join Graph: A graph or hypergraph of atoms connected by shared variables.
Structural Planning: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.
Sideways Information Passing: Semijoin-style filtering that uses known bindings to reduce later joins.
Antijoin Scheduling: Placement of negated atoms as soon as their variables are bound.
Physical Key Choice: Deliberate selection of keys and payload fields for maintained joins and arrangements.
Subplan Sharing: Reuse of common antecedents or intermediate relations across rules.
flowchart TB
Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]
Catalog --> Negation["Negation and Filters"]
JoinGraph --> Structural["Structural Planning"]
Negation --> Antijoin["Antijoin Scheduling"]
Catalog --> SIP["Sideways Information Passing"]
Structural --> Keys["Physical Key Choice"]
SIP --> Keys
Antijoin --> Keys
Keys --> IR["Optimized Relational IR"]
DBSP Connection
The DBSP notes focus on incremental maintenance:
input deltas
-> maintained operator state
-> output deltas
DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation, Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.
FlowLog's planning layer fits before DBSP:
rules
-> FlowLog-like planner
-> DBSP circuit
This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.
CRDT Connection
The CRDT notes use Datalog to define visible state over immutable operation facts.
Simple register queries are already a good fit:
set + pred
-> overwritten
-> visible values
The harder cases are recursive and structural:
causal readiness
list traversal
tombstone skipping
move-like tree operations
FlowLog helps by making the expensive parts explicit:
- causal-readiness recursion should be planned around frontiers when possible
- list traversal should avoid carrying unnecessary fields through every intermediate
- antijoins for tombstones should run as soon as their keys are available
- repeated list subqueries should share intermediate relations where possible
The practical CRDT target is:
same CRDT rules
-> naive plan
-> FlowLog-style plan
-> DBSP-maintained result
-> hydration and warm-update comparison
Geomerge Connection
The Geomerge notes propose compiling supported laws into maintained violation relations.
The simplest useful form is:
required_consequent(x) :- antecedent(x).
violation(x) :- required_consequent(x), not consequent(x).
FlowLog helps once antecedents become multi-atom joins:
violation(x, y, z) :-
A(x, y),
B(y, z),
C(z),
not D(x, z).
At that point, a compiler needs the same machinery FlowLog demonstrates:
- variable occurrence maps
- join graph extraction
- antijoin scheduling
- projection minimization
- shared antecedent detection
- violation-row construction
The practical Geomerge target is:
FlatTheory laws
-> supported relational subset
-> rule catalog
-> planned violation query
-> DBSP-maintained violations relation
Recommended Architecture
The recommended architecture has a backend-neutral middle layer.
flowchart TB
subgraph Sources["Source Layers"]
Datalog["Datalog CRDT Rules"]
Geolog["Compiled Geolog Laws"]
end
subgraph Planner["FlowLog-Inspired Planner"]
Parse["Parse or Translate"]
Strata["Stratify"]
Catalog["Catalog Rules"]
Graph["Join Graph Construction"]
Optimize["Plan Joins, Antijoins, and SIP"]
IR["Relational IR with Physical Keys"]
end
subgraph Backends["Execution Backends"]
DBSP["DBSP"]
DD["Differential Dataflow"]
Batch["Snapshot Evaluator"]
end
Datalog --> Parse
Geolog --> Parse
Parse --> Strata --> Catalog --> Graph --> Optimize --> IR
IR --> DBSP
IR --> DD
IR --> Batch
This architecture keeps the core questions separate:
- source language
- rule semantics
- relational planning
- physical execution backend
- application integration
That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the backend, or the storage boundary.
Practical Path
The practical path should be staged.
Stage 1: Planning-Only Prototype
Datalog-like rules
-> dependency graph
-> strata
-> rule catalog
-> join graph
-> textual plan
This validates whether the compiler understands rule shape.
Stage 2: Narrow DBSP Lowering
planned rules
-> projection, selection, join, antijoin, union, distinct, recursion
-> DBSP circuit
This validates maintained outputs against a snapshot evaluator.
Stage 3: Workload Comparison
same rules
same facts
same outputs
-> DBSP backend
-> Differential Dataflow backend
-> snapshot backend
This identifies whether bottlenecks come from planning or backend behavior.
Stage 4: Geomerge Integration
supported FlatTheory laws
-> planned violation queries
-> maintained violations relation
-> agreement with current validator
This makes the DBSP checker a performance optimization first, not a semantic change.
Test Workloads
The shared test corpus should include:
- transitive closure
- reachability
- connected components
- antijoin checks
- multi-value register
- causal readiness
- list next-element traversal
- tombstone skipping
- missing foreign-key violations
- multi-atom Geomerge antecedents
Each workload should have:
- input schemas
- base facts
- update facts
- expected snapshot output
- expected output deltas
- recursion and negation classification
- accepted or rejected status
Evaluation Questions
The main evaluation questions are:
- Does planning reduce hydration time?
- Does planning reduce warm-update time?
- Does causal-readiness update cost still grow with history depth?
- Does antijoin scheduling reduce intermediate relation size?
- Does SIP help frontier-shaped recursive queries?
- Does physical key choice reduce maintained state?
- Does the DBSP result match a snapshot evaluator?
- Does Geomerge validation agree with the existing validator?
- Is backend state rollback or preview execution tractable?
Bottom Line
The unified conclusion is:
FlowLog is the planning blueprint.
DBSP is the target incremental backend.
CRDTs and Geomerge laws are the motivating rule sources.
The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement, and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.
Changelog
- May 21, 2026 -- First version created to unify the first five FlowLog notes.