# Using FlowLog Ideas

A practical note on how FlowLog or FlowLog-style planning could be used for the local DBSP, CRDT, and Geomerge work.

---

## Short Answer

The most useful way to use FlowLog is as a planning reference, not as a direct dependency.

There are three possible levels of use:

```text
Level 1: run FlowLog examples to learn workload behavior
Level 2: borrow FlowLog planning ideas for a DBSP frontend
Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs
```

```mermaid
flowchart TB
    L1["Level 1<br/>Run FlowLog Examples"] --> L2["Level 2<br/>Borrow Planning Ideas"]
    L2 --> L3["Level 3<br/>Backend Comparison"]
    L2 --> DBSP["DBSP Frontend Work"]
    L3 --> Decision["Backend and Planner Decisions"]
```

The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer
before DBSP.

---

## Initial Non-Goals

The first step should not be replacing the DBSP backend with FlowLog.

That would conflate two separate questions:

- Which incremental backend should maintain deltas?
- Which frontend planner should produce the backend plan?

The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer.

The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source
concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.

---

## Use Case 1: Better CRDT Query Planning

The CRDT queries in the DBSP notes include:

- multi-value register queries
- causal-readiness queries
- list traversal queries
- tombstone-skipping queries

The multi-value register is already simple enough:

```text
set + pred -> overwritten -> visible values
```

The planning value is higher for causal readiness:

```text
pred graph
-> roots
-> recursive reachability
-> ready operations
```

and list traversal:

```text
insert tree
-> first child
-> next sibling
-> ancestor sibling
-> next element
-> next visible element
```

These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and
adding semijoin filters around the current frontier.

```mermaid
flowchart TB
    subgraph Causal["Causal Readiness"]
        Pred["pred Graph"] --> Roots["Roots"]
        Roots --> Ready["Ready Operations"]
        Ready --> Frontier["Frontier"]
        Frontier --> NewPred["Outgoing Pred Edges"]
        NewPred --> Ready
    end

    subgraph List["List Traversal"]
        Insert["insert Tree"] --> First["firstChild"]
        Insert --> Sibling["nextSibling"]
        First --> Next["nextElem"]
        Sibling --> Next
        Remove["remove Tombstones"] --> Visible["nextVisible"]
        Next --> Visible
    end
```

The concrete experiment:

```text
causal-readiness Datalog rules
rule catalog
naive plan
FlowLog-style planned version
DBSP hydration and warm-update comparison
```

The success criterion is not only lower total runtime. It is lower dependence on causal-history depth for small warm updates.

---

## Use Case 2: Geomerge Violation Planning

The Geomerge integration note proposes compiling supported laws into violation relations.

For simple laws, this is direct:

```text
missing_src(g, s) :-
    edge(g, s, d),
    not vertex(g, s).
```

For larger laws, the compiler needs a planner:

```text
violation(vars) :-
    antecedent_atom_1(...),
    antecedent_atom_2(...),
    antecedent_atom_3(...),
    not consequent_atom(...).
```

FlowLog-style catalogs would help the compiler answer:

- which variables are introduced by each atom
- which atoms join on which variables
- when each negated consequent can be checked
- which projected values are needed for the violation row
- whether two laws share a common antecedent

```mermaid
flowchart LR
    Law["Geomerge Law"] --> Rule["Datalog-Like Rule"]
    Rule --> Catalog["Rule Catalog"]
    Catalog --> JoinGraph["Join Graph"]
    JoinGraph --> Plan["Planned Relational Tree"]
    Plan --> Violation["Violation Relation"]
    Violation --> DBSP["DBSP Maintained Output"]
```

The concrete experiment:

```text
one Geomerge fixture theory
Datalog-like rule per supported law
join graph per rule
planned relational tree
comparison with the current direct validator's binding order
```

This can be useful before any DBSP integration exists, because it tests whether the compiler can understand the law shape.

---

## Use Case 3: Backend Comparison

FlowLog can also be used as a comparison point for DBSP.

The fair comparison is not:

```text
FlowLog product vs DBSP product
```

The useful comparison is:

```text
same Datalog query
same input facts
same output relation
different backend lowering
```

Candidate workloads:

- transitive closure
- causal readiness
- list next-element traversal
- missing foreign-key violations
- multi-atom Geomerge antecedents

The comparison should measure:

- hydration time
- warm-update time
- memory use
- sensitivity to join order
- output delta size
- ease of rollback or preview execution

```mermaid
flowchart TB
    Program["Same Datalog Program"] --> IR["Shared Relational IR"]
    Facts["Same Input Facts"] --> IR
    IR --> DBSP["DBSP Lowering"]
    IR --> DD["Differential Dataflow Lowering"]
    DBSP --> DbspMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
    DD --> DdMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
    DbspMetrics --> Compare["Backend Comparison"]
    DdMetrics --> Compare
```

This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid
batch-plus-incremental strategy is needed.

---

## Use Case 4: Test Corpus for Datalog Lowering

FlowLog's examples suggest a useful test corpus shape.

A local Datalog-to-DBSP frontend should include small programs for:

- reachability
- transitive closure
- connected components
- antijoin checks
- aggregation checks
- CRDT multi-value register
- CRDT causal readiness
- CRDT list traversal
- Geomerge-style violation detection

Each test should define:

- input schemas
- input facts
- expected output facts
- expected output deltas for at least one update
- whether recursion or negation is used
- whether the program should be accepted or rejected

This gives a better foundation than testing only one CRDT or one Geomerge law.

---

## First Prototype

The first useful prototype should be small.

A planning-only tool:

```text
Datalog-like rule text
-> parsed rules
-> dependency graph
-> strata
-> rule catalog
-> join graph
-> planned relational tree
```

```mermaid
flowchart LR
    Text["Rule Text"] --> Parse["Parsed Rules"]
    Parse --> Deps["Dependency Graph"]
    Deps --> Strata["Strata"]
    Parse --> Catalog["Rule Catalog"]
    Catalog --> JoinGraph["Join Graph"]
    Strata --> Plan["Planned Tree"]
    JoinGraph --> Plan
    Plan --> Explain["Textual Plan Explanation"]
```

It does not need to run DBSP at first.

The output can be textual:

```text
rule: missing_src
positive atoms: edge
negative atoms: vertex
join graph: none
plan:
  scan edge
  project (graph, src)
  antijoin vertex on (graph, src)
  emit violation row
```

For recursive rules, the output can identify the loop:

```text
recursive stratum:
  ready

base:
  roots -> ready

step:
  ready join pred on operation id
  project successor operation id
```

This prototype would validate the compiler shape before depending on a backend API.

---

## Second Prototype

The second prototype should lower a narrow subset to DBSP.

```mermaid
flowchart TB
    Subset["Supported Rule Subset"] --> Planner["Planner"]
    Planner --> IR["Relational IR"]
    IR --> Lowering["DBSP Lowering"]
    Lowering --> Runtime["DBSP Runtime"]
    Runtime --> Output["Maintained Outputs"]
    Snapshot["Naive Snapshot Evaluator"] --> Oracle["Correctness Oracle"]
    Output --> Oracle
```

Supported subset:

- relation declarations
- positive atoms
- equality joins
- constants
- simple comparisons
- stratified negation
- union of repeated rule heads
- one recursive IDB at a time

Excluded subset:

- aggregation
- mutual recursion
- disjunction
- existential generation
- equality saturation
- custom scalar functions

The target workloads:

- `missing_src` and `missing_dst`
- multi-value register
- transitive closure
- causal readiness

This subset is enough to test the important bridge:

```text
planned rules -> DBSP-maintained outputs
```

---

## Data Model Decisions

Several decisions should be made explicitly before implementation.

```mermaid
flowchart TB
    Decisions["Data Model Decisions"]
    Decisions --> Semantics["Set or Multiset Semantics"]
    Decisions --> Identity["Operation Identity"]
    Decisions --> Violations["Violation Row Shape"]
    Decisions --> Integration["Output Integration"]
    Decisions --> Rollback["Rollback or Preview"]
```

**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when
`distinct` is applied.

**Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical
key with two physical fields.

**Violation Rows**: Geomerge violations should include enough context for error messages, not just a boolean.

**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that
integration.

**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay
transactionally coupled to storage.

---

## Evaluation Plan

The evaluation should separate correctness from performance.

```mermaid
flowchart LR
    Inputs["Input Facts and Updates"] --> Naive["Naive Snapshot Evaluation"]
    Inputs --> Planned["Planned Backend Evaluation"]
    Naive --> Correctness["Correctness Check"]
    Planned --> Correctness
    Planned --> Perf["Performance Metrics"]
    Perf --> Hydration["Hydration"]
    Perf --> Warm["Warm Updates"]
    Perf --> Memory["Memory"]
    Perf --> Sensitivity["History and Join Sensitivity"]
```

Correctness checks:

```text
planned evaluation == naive snapshot evaluation
DBSP maintained result == snapshot result
failed Geomerge transaction leaves no checker drift
```

Performance checks:

```text
hydration time
warm-update time
memory used by maintained state
number of output delta rows
history-depth sensitivity
join-order sensitivity
```

The most important performance test is causal readiness:

```text
large causal history
+ small new update
-> does update cost grow with history depth?
```

If the answer is yes, the frontend needs frontier-aware planning or a different physical representation.

---

## Decision Points

The main decision points are:

- whether to implement a Datalog frontend or compile directly from Geolog laws
- whether the relational IR should be FlowLog-like, DBSP-like, or custom
- whether recursive planning should support mutual recursion early
- whether SIP should be automatic, directive-controlled, or both
- whether hydration should use the same backend as warm updates
- whether to persist backend operator state
- whether to compare against Differential Dataflow for recursive workloads

These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does
not force Differential Dataflow as the backend.

---

## Practical Recommendation

The first practical step is a planning-only FlowLog-inspired compiler layer.

The next step is lowering a small subset to DBSP.

After that, FlowLog itself can serve as a comparison backend for the same small programs.

```mermaid
flowchart LR
    P1["Planning-Only Compiler"] --> P2["DBSP Subset Lowering"]
    P2 --> P3["FlowLog Backend Comparison"]
    P3 --> P4["Shared IR Decision"]
    P4 --> P5["Production-Oriented Prototype"]
```

The goal should be:

```text
one rule frontend
one relational IR
two possible execution backends
```

That architecture would make it possible to test whether performance problems come from the query semantics, the planner, or the backend.

---

## Changelog

* **May 20, 2026** -- First version created from FlowLog, DBSP, CRDT, and Geomerge notes.