2026-05-20 15:36:14 +02:00
|
|
|
# Using FlowLog Ideas
|
|
|
|
|
|
|
|
|
|
A practical note on how FlowLog or FlowLog-style planning could be used for the local DBSP, CRDT, and Geomerge work.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Short Answer
|
|
|
|
|
|
|
|
|
|
The most useful way to use FlowLog is as a planning reference, not as a direct dependency.
|
|
|
|
|
|
|
|
|
|
There are three possible levels of use:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
Level 1: run FlowLog examples to learn workload behavior
|
|
|
|
|
Level 2: borrow FlowLog planning ideas for a DBSP frontend
|
|
|
|
|
Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart TB
|
|
|
|
|
L1["Level 1<br/>Run FlowLog Examples"] --> L2["Level 2<br/>Borrow Planning Ideas"]
|
|
|
|
|
L2 --> L3["Level 3<br/>Backend Comparison"]
|
|
|
|
|
L2 --> DBSP["DBSP Frontend Work"]
|
|
|
|
|
L3 --> Decision["Backend and Planner Decisions"]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer
|
|
|
|
|
before DBSP.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Initial Non-Goals
|
|
|
|
|
|
|
|
|
|
The first step should not be replacing the DBSP backend with FlowLog.
|
|
|
|
|
|
|
|
|
|
That would conflate two separate questions:
|
|
|
|
|
|
|
|
|
|
- Which incremental backend should maintain deltas?
|
|
|
|
|
- Which frontend planner should produce the backend plan?
|
|
|
|
|
|
|
|
|
|
The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source
|
|
|
|
|
concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Use Case 1: Better CRDT Query Planning
|
|
|
|
|
|
|
|
|
|
The CRDT queries in the DBSP notes include:
|
|
|
|
|
|
|
|
|
|
- multi-value register queries
|
|
|
|
|
- causal-readiness queries
|
|
|
|
|
- list traversal queries
|
|
|
|
|
- tombstone-skipping queries
|
|
|
|
|
|
|
|
|
|
The multi-value register is already simple enough:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
set + pred -> overwritten -> visible values
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The planning value is higher for causal readiness:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
pred graph
|
|
|
|
|
-> roots
|
|
|
|
|
-> recursive reachability
|
|
|
|
|
-> ready operations
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
and list traversal:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
insert tree
|
|
|
|
|
-> first child
|
|
|
|
|
-> next sibling
|
|
|
|
|
-> ancestor sibling
|
|
|
|
|
-> next element
|
|
|
|
|
-> next visible element
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and
|
|
|
|
|
adding semijoin filters around the current frontier.
|
|
|
|
|
|
|
|
|
|
```mermaid
|
|
|
|
|
flowchart TB
|
|
|
|
|
subgraph Causal["Causal Readiness"]
|
|
|
|
|
Pred["pred Graph"] --> Roots["Roots"]
|
|
|
|
|
Roots --> Ready["Ready Operations"]
|
|
|
|
|
Ready --> Frontier["Frontier"]
|
|
|
|
|
Frontier --> NewPred["Outgoing Pred Edges"]
|
|
|
|
|
NewPred --> Ready
|
|
|
|
|
end
|
|
|
|
|
|
|
|
|
|
subgraph List["List Traversal"]
|
|
|
|
|
Insert["insert Tree"] --> First["firstChild"]
|
|
|
|
|
Insert --> Sibling["nextSibling"]
|
|
|
|
|
First --> Next["nextElem"]
|
|
|
|
|
Sibling --> Next
|
|
|
|
|
Remove["remove Tombstones"] --> Visible["nextVisible"]
|
|
|
|
|
Next --> Visible
|
|
|
|
|
end
|
|
|
|
|
```
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
The concrete experiment:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
causal-readiness Datalog rules
|
|
|
|
|
rule catalog
|
|
|
|
|
naive plan
|
|
|
|
|
FlowLog-style planned version
|
|
|
|
|
DBSP hydration and warm-update comparison
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The success criterion is not only lower total runtime. It is lower dependence on causal-history depth for small warm updates.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Use Case 2: Geomerge Violation Planning
|
|
|
|
|
|
|
|
|
|
The Geomerge integration note proposes compiling supported laws into violation relations.
|
|
|
|
|
|
|
|
|
|
For simple laws, this is direct:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
missing_src(g, s) :-
|
|
|
|
|
edge(g, s, d),
|
|
|
|
|
not vertex(g, s).
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
For larger laws, the compiler needs a planner:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
violation(vars) :-
|
|
|
|
|
antecedent_atom_1(...),
|
|
|
|
|
antecedent_atom_2(...),
|
|
|
|
|
antecedent_atom_3(...),
|
|
|
|
|
not consequent_atom(...).
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
FlowLog-style catalogs would help the compiler answer:
|
|
|
|
|
|
|
|
|
|
- which variables are introduced by each atom
|
|
|
|
|
- which atoms join on which variables
|
|
|
|
|
- when each negated consequent can be checked
|
|
|
|
|
- which projected values are needed for the violation row
|
|
|
|
|
- whether two laws share a common antecedent
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart LR
|
|
|
|
|
Law["Geomerge Law"] --> Rule["Datalog-Like Rule"]
|
|
|
|
|
Rule --> Catalog["Rule Catalog"]
|
|
|
|
|
Catalog --> JoinGraph["Join Graph"]
|
|
|
|
|
JoinGraph --> Plan["Planned Relational Tree"]
|
|
|
|
|
Plan --> Violation["Violation Relation"]
|
|
|
|
|
Violation --> DBSP["DBSP Maintained Output"]
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:36:14 +02:00
|
|
|
The concrete experiment:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
one Geomerge fixture theory
|
|
|
|
|
Datalog-like rule per supported law
|
|
|
|
|
join graph per rule
|
|
|
|
|
planned relational tree
|
|
|
|
|
comparison with the current direct validator's binding order
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This can be useful before any DBSP integration exists, because it tests whether the compiler can understand the law shape.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Use Case 3: Backend Comparison
|
|
|
|
|
|
|
|
|
|
FlowLog can also be used as a comparison point for DBSP.
|
|
|
|
|
|
|
|
|
|
The fair comparison is not:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
FlowLog product vs DBSP product
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The useful comparison is:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
same Datalog query
|
|
|
|
|
same input facts
|
|
|
|
|
same output relation
|
|
|
|
|
different backend lowering
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Candidate workloads:
|
|
|
|
|
|
|
|
|
|
- transitive closure
|
|
|
|
|
- causal readiness
|
|
|
|
|
- list next-element traversal
|
|
|
|
|
- missing foreign-key violations
|
|
|
|
|
- multi-atom Geomerge antecedents
|
|
|
|
|
|
|
|
|
|
The comparison should measure:
|
|
|
|
|
|
|
|
|
|
- hydration time
|
|
|
|
|
- warm-update time
|
|
|
|
|
- memory use
|
|
|
|
|
- sensitivity to join order
|
|
|
|
|
- output delta size
|
|
|
|
|
- ease of rollback or preview execution
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart TB
|
|
|
|
|
Program["Same Datalog Program"] --> IR["Shared Relational IR"]
|
|
|
|
|
Facts["Same Input Facts"] --> IR
|
|
|
|
|
IR --> DBSP["DBSP Lowering"]
|
|
|
|
|
IR --> DD["Differential Dataflow Lowering"]
|
|
|
|
|
DBSP --> DbspMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
|
|
|
|
|
DD --> DdMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
|
|
|
|
|
DbspMetrics --> Compare["Backend Comparison"]
|
|
|
|
|
DdMetrics --> Compare
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid
|
|
|
|
|
batch-plus-incremental strategy is needed.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Use Case 4: Test Corpus for Datalog Lowering
|
|
|
|
|
|
|
|
|
|
FlowLog's examples suggest a useful test corpus shape.
|
|
|
|
|
|
|
|
|
|
A local Datalog-to-DBSP frontend should include small programs for:
|
|
|
|
|
|
|
|
|
|
- reachability
|
|
|
|
|
- transitive closure
|
|
|
|
|
- connected components
|
|
|
|
|
- antijoin checks
|
|
|
|
|
- aggregation checks
|
|
|
|
|
- CRDT multi-value register
|
|
|
|
|
- CRDT causal readiness
|
|
|
|
|
- CRDT list traversal
|
|
|
|
|
- Geomerge-style violation detection
|
|
|
|
|
|
|
|
|
|
Each test should define:
|
|
|
|
|
|
|
|
|
|
- input schemas
|
|
|
|
|
- input facts
|
|
|
|
|
- expected output facts
|
|
|
|
|
- expected output deltas for at least one update
|
|
|
|
|
- whether recursion or negation is used
|
|
|
|
|
- whether the program should be accepted or rejected
|
|
|
|
|
|
|
|
|
|
This gives a better foundation than testing only one CRDT or one Geomerge law.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## First Prototype
|
|
|
|
|
|
|
|
|
|
The first useful prototype should be small.
|
|
|
|
|
|
|
|
|
|
A planning-only tool:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
Datalog-like rule text
|
|
|
|
|
-> parsed rules
|
|
|
|
|
-> dependency graph
|
|
|
|
|
-> strata
|
|
|
|
|
-> rule catalog
|
|
|
|
|
-> join graph
|
|
|
|
|
-> planned relational tree
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart LR
|
|
|
|
|
Text["Rule Text"] --> Parse["Parsed Rules"]
|
|
|
|
|
Parse --> Deps["Dependency Graph"]
|
|
|
|
|
Deps --> Strata["Strata"]
|
|
|
|
|
Parse --> Catalog["Rule Catalog"]
|
|
|
|
|
Catalog --> JoinGraph["Join Graph"]
|
|
|
|
|
Strata --> Plan["Planned Tree"]
|
|
|
|
|
JoinGraph --> Plan
|
|
|
|
|
Plan --> Explain["Textual Plan Explanation"]
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:36:14 +02:00
|
|
|
It does not need to run DBSP at first.
|
|
|
|
|
|
|
|
|
|
The output can be textual:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
rule: missing_src
|
|
|
|
|
positive atoms: edge
|
|
|
|
|
negative atoms: vertex
|
|
|
|
|
join graph: none
|
|
|
|
|
plan:
|
|
|
|
|
scan edge
|
|
|
|
|
project (graph, src)
|
|
|
|
|
antijoin vertex on (graph, src)
|
|
|
|
|
emit violation row
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
For recursive rules, the output can identify the loop:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
recursive stratum:
|
|
|
|
|
ready
|
|
|
|
|
|
|
|
|
|
base:
|
|
|
|
|
roots -> ready
|
|
|
|
|
|
|
|
|
|
step:
|
|
|
|
|
ready join pred on operation id
|
|
|
|
|
project successor operation id
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This prototype would validate the compiler shape before depending on a backend API.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Second Prototype
|
|
|
|
|
|
|
|
|
|
The second prototype should lower a narrow subset to DBSP.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart TB
|
|
|
|
|
Subset["Supported Rule Subset"] --> Planner["Planner"]
|
|
|
|
|
Planner --> IR["Relational IR"]
|
|
|
|
|
IR --> Lowering["DBSP Lowering"]
|
|
|
|
|
Lowering --> Runtime["DBSP Runtime"]
|
|
|
|
|
Runtime --> Output["Maintained Outputs"]
|
|
|
|
|
Snapshot["Naive Snapshot Evaluator"] --> Oracle["Correctness Oracle"]
|
|
|
|
|
Output --> Oracle
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:36:14 +02:00
|
|
|
Supported subset:
|
|
|
|
|
|
|
|
|
|
- relation declarations
|
|
|
|
|
- positive atoms
|
|
|
|
|
- equality joins
|
|
|
|
|
- constants
|
|
|
|
|
- simple comparisons
|
|
|
|
|
- stratified negation
|
|
|
|
|
- union of repeated rule heads
|
|
|
|
|
- one recursive IDB at a time
|
|
|
|
|
|
|
|
|
|
Excluded subset:
|
|
|
|
|
|
|
|
|
|
- aggregation
|
|
|
|
|
- mutual recursion
|
|
|
|
|
- disjunction
|
|
|
|
|
- existential generation
|
|
|
|
|
- equality saturation
|
|
|
|
|
- custom scalar functions
|
|
|
|
|
|
|
|
|
|
The target workloads:
|
|
|
|
|
|
|
|
|
|
- `missing_src` and `missing_dst`
|
|
|
|
|
- multi-value register
|
|
|
|
|
- transitive closure
|
|
|
|
|
- causal readiness
|
|
|
|
|
|
|
|
|
|
This subset is enough to test the important bridge:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
planned rules -> DBSP-maintained outputs
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Data Model Decisions
|
|
|
|
|
|
|
|
|
|
Several decisions should be made explicitly before implementation.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart TB
|
|
|
|
|
Decisions["Data Model Decisions"]
|
|
|
|
|
Decisions --> Semantics["Set or Multiset Semantics"]
|
|
|
|
|
Decisions --> Identity["Operation Identity"]
|
|
|
|
|
Decisions --> Violations["Violation Row Shape"]
|
|
|
|
|
Decisions --> Integration["Output Integration"]
|
|
|
|
|
Decisions --> Rollback["Rollback or Preview"]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when
|
|
|
|
|
`distinct` is applied.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
**Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical
|
|
|
|
|
key with two physical fields.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
**Violation Rows**: Geomerge violations should include enough context for error messages, not just a boolean.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that
|
|
|
|
|
integration.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay
|
|
|
|
|
transactionally coupled to storage.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Evaluation Plan
|
|
|
|
|
|
|
|
|
|
The evaluation should separate correctness from performance.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart LR
|
|
|
|
|
Inputs["Input Facts and Updates"] --> Naive["Naive Snapshot Evaluation"]
|
|
|
|
|
Inputs --> Planned["Planned Backend Evaluation"]
|
|
|
|
|
Naive --> Correctness["Correctness Check"]
|
|
|
|
|
Planned --> Correctness
|
|
|
|
|
Planned --> Perf["Performance Metrics"]
|
|
|
|
|
Perf --> Hydration["Hydration"]
|
|
|
|
|
Perf --> Warm["Warm Updates"]
|
|
|
|
|
Perf --> Memory["Memory"]
|
|
|
|
|
Perf --> Sensitivity["History and Join Sensitivity"]
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:36:14 +02:00
|
|
|
Correctness checks:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
planned evaluation == naive snapshot evaluation
|
|
|
|
|
DBSP maintained result == snapshot result
|
|
|
|
|
failed Geomerge transaction leaves no checker drift
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Performance checks:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
hydration time
|
|
|
|
|
warm-update time
|
|
|
|
|
memory used by maintained state
|
|
|
|
|
number of output delta rows
|
|
|
|
|
history-depth sensitivity
|
|
|
|
|
join-order sensitivity
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The most important performance test is causal readiness:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
large causal history
|
|
|
|
|
+ small new update
|
|
|
|
|
-> does update cost grow with history depth?
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If the answer is yes, the frontend needs frontier-aware planning or a different physical representation.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Decision Points
|
|
|
|
|
|
|
|
|
|
The main decision points are:
|
|
|
|
|
|
|
|
|
|
- whether to implement a Datalog frontend or compile directly from Geolog laws
|
|
|
|
|
- whether the relational IR should be FlowLog-like, DBSP-like, or custom
|
|
|
|
|
- whether recursive planning should support mutual recursion early
|
|
|
|
|
- whether SIP should be automatic, directive-controlled, or both
|
|
|
|
|
- whether hydration should use the same backend as warm updates
|
|
|
|
|
- whether to persist backend operator state
|
|
|
|
|
- whether to compare against Differential Dataflow for recursive workloads
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does
|
|
|
|
|
not force Differential Dataflow as the backend.
|
2026-05-20 15:36:14 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Practical Recommendation
|
|
|
|
|
|
|
|
|
|
The first practical step is a planning-only FlowLog-inspired compiler layer.
|
|
|
|
|
|
|
|
|
|
The next step is lowering a small subset to DBSP.
|
|
|
|
|
|
|
|
|
|
After that, FlowLog itself can serve as a comparison backend for the same small programs.
|
|
|
|
|
|
2026-05-20 15:54:55 +02:00
|
|
|
```mermaid
|
|
|
|
|
flowchart LR
|
|
|
|
|
P1["Planning-Only Compiler"] --> P2["DBSP Subset Lowering"]
|
|
|
|
|
P2 --> P3["FlowLog Backend Comparison"]
|
|
|
|
|
P3 --> P4["Shared IR Decision"]
|
|
|
|
|
P4 --> P5["Production-Oriented Prototype"]
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-20 15:36:14 +02:00
|
|
|
The goal should be:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
one rule frontend
|
|
|
|
|
one relational IR
|
|
|
|
|
two possible execution backends
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
That architecture would make it possible to test whether performance problems come from the query semantics, the planner, or the backend.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Changelog
|
|
|
|
|
|
|
|
|
|
* **May 20, 2026** -- First version created from FlowLog, DBSP, CRDT, and Geomerge notes.
|