# Using FlowLog Ideas A practical note on how FlowLog or FlowLog-style planning could be used for the local DBSP, CRDT, and Geomerge work. --- ## Short Answer The most useful way to use FlowLog is as a planning reference, not as a direct dependency. There are three possible levels of use: ```text Level 1: run FlowLog examples to learn workload behavior Level 2: borrow FlowLog planning ideas for a DBSP frontend Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs ``` ```mermaid flowchart TB L1["Level 1
Run FlowLog Examples"] --> L2["Level 2
Borrow Planning Ideas"] L2 --> L3["Level 3
Backend Comparison"] L2 --> DBSP["DBSP Frontend Work"] L3 --> Decision["Backend and Planner Decisions"] ``` The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer before DBSP. --- ## Initial Non-Goals The first step should not be replacing the DBSP backend with FlowLog. That would conflate two separate questions: - Which incremental backend should maintain deltas? - Which frontend planner should produce the backend plan? The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer. The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit. --- ## Use Case 1: Better CRDT Query Planning The CRDT queries in the DBSP notes include: - multi-value register queries - causal-readiness queries - list traversal queries - tombstone-skipping queries The multi-value register is already simple enough: ```text set + pred -> overwritten -> visible values ``` The planning value is higher for causal readiness: ```text pred graph -> roots -> recursive reachability -> ready operations ``` and list traversal: ```text insert tree -> first child -> next sibling -> ancestor sibling -> next element -> next visible element ``` These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and adding semijoin filters around the current frontier. ```mermaid flowchart TB subgraph Causal["Causal Readiness"] Pred["pred Graph"] --> Roots["Roots"] Roots --> Ready["Ready Operations"] Ready --> Frontier["Frontier"] Frontier --> NewPred["Outgoing Pred Edges"] NewPred --> Ready end subgraph List["List Traversal"] Insert["insert Tree"] --> First["firstChild"] Insert --> Sibling["nextSibling"] First --> Next["nextElem"] Sibling --> Next Remove["remove Tombstones"] --> Visible["nextVisible"] Next --> Visible end ``` The concrete experiment: ```text causal-readiness Datalog rules rule catalog naive plan FlowLog-style planned version DBSP hydration and warm-update comparison ``` The success criterion is not only lower total runtime. It is lower dependence on causal-history depth for small warm updates. --- ## Use Case 2: Geomerge Violation Planning The Geomerge integration note proposes compiling supported laws into violation relations. For simple laws, this is direct: ```text missing_src(g, s) :- edge(g, s, d), not vertex(g, s). ``` For larger laws, the compiler needs a planner: ```text violation(vars) :- antecedent_atom_1(...), antecedent_atom_2(...), antecedent_atom_3(...), not consequent_atom(...). ``` FlowLog-style catalogs would help the compiler answer: - which variables are introduced by each atom - which atoms join on which variables - when each negated consequent can be checked - which projected values are needed for the violation row - whether two laws share a common antecedent ```mermaid flowchart LR Law["Geomerge Law"] --> Rule["Datalog-Like Rule"] Rule --> Catalog["Rule Catalog"] Catalog --> JoinGraph["Join Graph"] JoinGraph --> Plan["Planned Relational Tree"] Plan --> Violation["Violation Relation"] Violation --> DBSP["DBSP Maintained Output"] ``` The concrete experiment: ```text one Geomerge fixture theory Datalog-like rule per supported law join graph per rule planned relational tree comparison with the current direct validator's binding order ``` This can be useful before any DBSP integration exists, because it tests whether the compiler can understand the law shape. --- ## Use Case 3: Backend Comparison FlowLog can also be used as a comparison point for DBSP. The fair comparison is not: ```text FlowLog product vs DBSP product ``` The useful comparison is: ```text same Datalog query same input facts same output relation different backend lowering ``` Candidate workloads: - transitive closure - causal readiness - list next-element traversal - missing foreign-key violations - multi-atom Geomerge antecedents The comparison should measure: - hydration time - warm-update time - memory use - sensitivity to join order - output delta size - ease of rollback or preview execution ```mermaid flowchart TB Program["Same Datalog Program"] --> IR["Shared Relational IR"] Facts["Same Input Facts"] --> IR IR --> DBSP["DBSP Lowering"] IR --> DD["Differential Dataflow Lowering"] DBSP --> DbspMetrics["Hydration
Warm Updates
Memory
Deltas"] DD --> DdMetrics["Hydration
Warm Updates
Memory
Deltas"] DbspMetrics --> Compare["Backend Comparison"] DdMetrics --> Compare ``` This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid batch-plus-incremental strategy is needed. --- ## Use Case 4: Test Corpus for Datalog Lowering FlowLog's examples suggest a useful test corpus shape. A local Datalog-to-DBSP frontend should include small programs for: - reachability - transitive closure - connected components - antijoin checks - aggregation checks - CRDT multi-value register - CRDT causal readiness - CRDT list traversal - Geomerge-style violation detection Each test should define: - input schemas - input facts - expected output facts - expected output deltas for at least one update - whether recursion or negation is used - whether the program should be accepted or rejected This gives a better foundation than testing only one CRDT or one Geomerge law. --- ## First Prototype The first useful prototype should be small. A planning-only tool: ```text Datalog-like rule text -> parsed rules -> dependency graph -> strata -> rule catalog -> join graph -> planned relational tree ``` ```mermaid flowchart LR Text["Rule Text"] --> Parse["Parsed Rules"] Parse --> Deps["Dependency Graph"] Deps --> Strata["Strata"] Parse --> Catalog["Rule Catalog"] Catalog --> JoinGraph["Join Graph"] Strata --> Plan["Planned Tree"] JoinGraph --> Plan Plan --> Explain["Textual Plan Explanation"] ``` It does not need to run DBSP at first. The output can be textual: ```text rule: missing_src positive atoms: edge negative atoms: vertex join graph: none plan: scan edge project (graph, src) antijoin vertex on (graph, src) emit violation row ``` For recursive rules, the output can identify the loop: ```text recursive stratum: ready base: roots -> ready step: ready join pred on operation id project successor operation id ``` This prototype would validate the compiler shape before depending on a backend API. --- ## Second Prototype The second prototype should lower a narrow subset to DBSP. ```mermaid flowchart TB Subset["Supported Rule Subset"] --> Planner["Planner"] Planner --> IR["Relational IR"] IR --> Lowering["DBSP Lowering"] Lowering --> Runtime["DBSP Runtime"] Runtime --> Output["Maintained Outputs"] Snapshot["Naive Snapshot Evaluator"] --> Oracle["Correctness Oracle"] Output --> Oracle ``` Supported subset: - relation declarations - positive atoms - equality joins - constants - simple comparisons - stratified negation - union of repeated rule heads - one recursive IDB at a time Excluded subset: - aggregation - mutual recursion - disjunction - existential generation - equality saturation - custom scalar functions The target workloads: - `missing_src` and `missing_dst` - multi-value register - transitive closure - causal readiness This subset is enough to test the important bridge: ```text planned rules -> DBSP-maintained outputs ``` --- ## Data Model Decisions Several decisions should be made explicitly before implementation. ```mermaid flowchart TB Decisions["Data Model Decisions"] Decisions --> Semantics["Set or Multiset Semantics"] Decisions --> Identity["Operation Identity"] Decisions --> Violations["Violation Row Shape"] Decisions --> Integration["Output Integration"] Decisions --> Rollback["Rollback or Preview"] ``` **Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when `distinct` is applied. **Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical key with two physical fields. **Violation Rows**: Geomerge violations should include enough context for error messages, not just a boolean. **Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that integration. **Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay transactionally coupled to storage. --- ## Evaluation Plan The evaluation should separate correctness from performance. ```mermaid flowchart LR Inputs["Input Facts and Updates"] --> Naive["Naive Snapshot Evaluation"] Inputs --> Planned["Planned Backend Evaluation"] Naive --> Correctness["Correctness Check"] Planned --> Correctness Planned --> Perf["Performance Metrics"] Perf --> Hydration["Hydration"] Perf --> Warm["Warm Updates"] Perf --> Memory["Memory"] Perf --> Sensitivity["History and Join Sensitivity"] ``` Correctness checks: ```text planned evaluation == naive snapshot evaluation DBSP maintained result == snapshot result failed Geomerge transaction leaves no checker drift ``` Performance checks: ```text hydration time warm-update time memory used by maintained state number of output delta rows history-depth sensitivity join-order sensitivity ``` The most important performance test is causal readiness: ```text large causal history + small new update -> does update cost grow with history depth? ``` If the answer is yes, the frontend needs frontier-aware planning or a different physical representation. --- ## Decision Points The main decision points are: - whether to implement a Datalog frontend or compile directly from Geolog laws - whether the relational IR should be FlowLog-like, DBSP-like, or custom - whether recursive planning should support mutual recursion early - whether SIP should be automatic, directive-controlled, or both - whether hydration should use the same backend as warm updates - whether to persist backend operator state - whether to compare against Differential Dataflow for recursive workloads These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does not force Differential Dataflow as the backend. --- ## Practical Recommendation The first practical step is a planning-only FlowLog-inspired compiler layer. The next step is lowering a small subset to DBSP. After that, FlowLog itself can serve as a comparison backend for the same small programs. ```mermaid flowchart LR P1["Planning-Only Compiler"] --> P2["DBSP Subset Lowering"] P2 --> P3["FlowLog Backend Comparison"] P3 --> P4["Shared IR Decision"] P4 --> P5["Production-Oriented Prototype"] ``` The goal should be: ```text one rule frontend one relational IR two possible execution backends ``` That architecture would make it possible to test whether performance problems come from the query semantics, the planner, or the backend. --- ## Changelog * **May 20, 2026** -- First version created from FlowLog, DBSP, CRDT, and Geomerge notes.