# FlowLog Synthesis A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan. --- ## Short Answer The five FlowLog notes make one argument: ```text FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP. ``` The core architecture is: ```text Datalog or Geolog-shaped rules -> dependency analysis and strata -> rule catalog -> join graph and relational plan -> FlowLog-style optimization -> DBSP or Differential Dataflow backend -> maintained outputs ``` FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated. ```mermaid flowchart LR Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"] Strata --> Catalog["Rule Catalog"] Catalog --> Plan["Relational Plan"] Plan --> Optimize["FlowLog-Style Optimization"] Optimize --> IR["Backend-Neutral IR"] IR --> DBSP["DBSP Backend"] IR --> DD["Differential Dataflow Backend"] DBSP --> Outputs["Maintained Outputs"] DD --> Outputs ``` --- ## How the Notes Fit Together The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend, while keeping Datalog-specific planning visible before lowering to dataflow operators. The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is: ```text parsing -> strata -> catalog -> planning -> optimizing -> executing ``` The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing inputs. FlowLog helps answer what relational plan should be maintained. The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows, join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice. The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype, backend comparison, test corpus, data model decisions, and evaluation plan. Together, the notes move from: ```text what FlowLog is -> how it is built -> why it matters for DBSP -> which technical pieces transfer -> how to use those pieces ``` --- ## Unified Mental Model The shared mental model is that Datalog execution has three separate layers. The source layer owns user-facing or system-facing rules: ```text Datalog programs Geolog laws CRDT definitions ``` The planning layer owns the logical and physical shape of evaluation: ```text strata rule catalogs join graphs antijoin placement SIP filters physical keys shared subplans ``` The backend layer owns maintained computation: ```text DBSP circuits Differential Dataflow dataflows batch evaluators ``` The main design rule is: ```text Backend execution should not rediscover rule semantics. ``` The backend should receive a checked, stratified, and optimized relational plan. --- ## FlowLog's Transferable Pieces The most transferable pieces are not tied to Differential Dataflow. **Rule Catalog**: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection. **Stratification**: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit. **Join Graph**: A graph or hypergraph of atoms connected by shared variables. **Structural Planning**: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity. **Sideways Information Passing**: Semijoin-style filtering that uses known bindings to reduce later joins. **Antijoin Scheduling**: Placement of negated atoms as soon as their variables are bound. **Physical Key Choice**: Deliberate selection of keys and payload fields for maintained joins and arrangements. **Subplan Sharing**: Reuse of common antecedents or intermediate relations across rules. ```mermaid flowchart TB Catalog["Rule Catalog"] --> JoinGraph["Join Graph"] Catalog --> Negation["Negation and Filters"] JoinGraph --> Structural["Structural Planning"] Negation --> Antijoin["Antijoin Scheduling"] Catalog --> SIP["Sideways Information Passing"] Structural --> Keys["Physical Key Choice"] SIP --> Keys Antijoin --> Keys Keys --> IR["Optimized Relational IR"] ``` --- ## DBSP Connection The DBSP notes focus on incremental maintenance: ```text input deltas -> maintained operator state -> output deltas ``` DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation, Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics. FlowLog's planning layer fits before DBSP: ```text rules -> FlowLog-like planner -> DBSP circuit ``` This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit. --- ## CRDT Connection The CRDT notes use Datalog to define visible state over immutable operation facts. Simple register queries are already a good fit: ```text set + pred -> overwritten -> visible values ``` The harder cases are recursive and structural: ```text causal readiness list traversal tombstone skipping move-like tree operations ``` FlowLog helps by making the expensive parts explicit: - causal-readiness recursion should be planned around frontiers when possible - list traversal should avoid carrying unnecessary fields through every intermediate - antijoins for tombstones should run as soon as their keys are available - repeated list subqueries should share intermediate relations where possible The practical CRDT target is: ```text same CRDT rules -> naive plan -> FlowLog-style plan -> DBSP-maintained result -> hydration and warm-update comparison ``` --- ## Geomerge Connection The Geomerge notes propose compiling supported laws into maintained violation relations. The simplest useful form is: ```text required_consequent(x) :- antecedent(x). violation(x) :- required_consequent(x), not consequent(x). ``` FlowLog helps once antecedents become multi-atom joins: ```text violation(x, y, z) :- A(x, y), B(y, z), C(z), not D(x, z). ``` At that point, a compiler needs the same machinery FlowLog demonstrates: - variable occurrence maps - join graph extraction - antijoin scheduling - projection minimization - shared antecedent detection - violation-row construction The practical Geomerge target is: ```text FlatTheory laws -> supported relational subset -> rule catalog -> planned violation query -> DBSP-maintained violations relation ``` --- ## Recommended Architecture The recommended architecture has a backend-neutral middle layer. ```mermaid flowchart TB subgraph Sources["Source Layers"] Datalog["Datalog CRDT Rules"] Geolog["Compiled Geolog Laws"] end subgraph Planner["FlowLog-Inspired Planner"] Parse["Parse or Translate"] Strata["Stratify"] Catalog["Catalog Rules"] Graph["Join Graph Construction"] Optimize["Plan Joins, Antijoins, and SIP"] IR["Relational IR with Physical Keys"] end subgraph Backends["Execution Backends"] DBSP["DBSP"] DD["Differential Dataflow"] Batch["Snapshot Evaluator"] end Datalog --> Parse Geolog --> Parse Parse --> Strata --> Catalog --> Graph --> Optimize --> IR IR --> DBSP IR --> DD IR --> Batch ``` This architecture keeps the core questions separate: - source language - rule semantics - relational planning - physical execution backend - application integration That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the backend, or the storage boundary. --- ## Practical Path The practical path should be staged. **Stage 1: Planning-Only Prototype** ```text Datalog-like rules -> dependency graph -> strata -> rule catalog -> join graph -> textual plan ``` This validates whether the compiler understands rule shape. **Stage 2: Narrow DBSP Lowering** ```text planned rules -> projection, selection, join, antijoin, union, distinct, recursion -> DBSP circuit ``` This validates maintained outputs against a snapshot evaluator. **Stage 3: Workload Comparison** ```text same rules same facts same outputs -> DBSP backend -> Differential Dataflow backend -> snapshot backend ``` This identifies whether bottlenecks come from planning or backend behavior. **Stage 4: Geomerge Integration** ```text supported FlatTheory laws -> planned violation queries -> maintained violations relation -> agreement with current validator ``` This makes the DBSP checker a performance optimization first, not a semantic change. --- ## Test Workloads The shared test corpus should include: - transitive closure - reachability - connected components - antijoin checks - multi-value register - causal readiness - list next-element traversal - tombstone skipping - missing foreign-key violations - multi-atom Geomerge antecedents Each workload should have: - input schemas - base facts - update facts - expected snapshot output - expected output deltas - recursion and negation classification - accepted or rejected status --- ## Evaluation Questions The main evaluation questions are: - Does planning reduce hydration time? - Does planning reduce warm-update time? - Does causal-readiness update cost still grow with history depth? - Does antijoin scheduling reduce intermediate relation size? - Does SIP help frontier-shaped recursive queries? - Does physical key choice reduce maintained state? - Does the DBSP result match a snapshot evaluator? - Does Geomerge validation agree with the existing validator? - Is backend state rollback or preview execution tractable? --- ## Bottom Line The unified conclusion is: ```text FlowLog is the planning blueprint. DBSP is the target incremental backend. CRDTs and Geomerge laws are the motivating rule sources. ``` The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement, and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem. --- ## Changelog * **May 21, 2026** -- First version created to unify the first five FlowLog notes.