Add mermaid diagrams to the note files

2026-05-20 15:54:55 +02:00 · 2026-05-20 15:54:55 +02:00 · 2bfcb7e818
commit 2bfcb7e818
parent 99c30190a8
3 changed files with 293 additions and 20 deletions
--- a/flowlog/003-flowlog-and-dbsp-synergy.md
+++ b/flowlog/003-flowlog-and-dbsp-synergy.md
@ -107,7 +107,8 @@ mvrStore(Key, Value) :-
    not overwritten(RepId, Ctr).
 ```

-This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation history.
+This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation
+history.

 The causal-readiness query is harder:

@ -122,9 +123,13 @@ isCausallyReady(RepId, Ctr) :-

 This is recursive graph traversal. The DBSP CRDT notes report that this query can remain dependent on causal-history depth, even during warm updates.

-FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings. For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots through the whole causal graph.
+FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings.
+For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots
+through the whole causal graph.

-The list CRDT query is also planning-sensitive. Relations such as `firstChild`, `nextSibling`, `nextSiblingAnc`, `nextElem`, and `nextVisible` create several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP builds maintained operator state.
+The list CRDT query is also planning-sensitive. Relations such as `firstChild`, `nextSibling`, `nextSiblingAnc`, `nextElem`, and `nextVisible` create
+several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP
+builds maintained operator state.

 ---

@ -195,7 +200,8 @@ For Geomerge, the source may be compiled Geolog laws.

 For execution, the backend may be DBSP, Differential Dataflow, or a non-incremental batch engine.

-This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been checked, stratified, and optimized.
+This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been
+checked, stratified, and optimized.

 ---

@ -203,7 +209,8 @@ This matches the existing DBSP notes: DBSP should not own full source-language s

 FlowLog suggests several optimizations that transfer well to DBSP-backed work.

-**Structural Planning**: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates are weak.
+**Structural Planning**: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates
+are weak.

 **Sideways Information Passing**: Add semijoin-style filters so later joins and recursive steps see fewer irrelevant tuples.

@ -211,7 +218,8 @@ FlowLog suggests several optimizations that transfer well to DBSP-backed work.

 **Subplan Sharing**: Reuse common derived relations across laws or CRDT views when multiple outputs need the same intermediate facts.

-**Physical Key Choice**: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and arrangement choices will become runtime costs.
+**Physical Key Choice**: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and
+arrangement choices will become runtime costs.

 ---

@ -230,7 +238,8 @@ The shared lesson is more important than the difference:
 incremental backends maintain operator state
 ```

-That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase memory use and update cost for the lifetime of the maintained query.
+That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase
+memory use and update cost for the lifetime of the maintained query.

 FlowLog is useful because it treats planning as a first-class layer before execution.

--- a/flowlog/004-flowlog-technical-planning-notes.md
+++ b/flowlog/004-flowlog-technical-planning-notes.md
@ -18,7 +18,17 @@ rule atoms and variables
 -> physical operator choices
 ```

-That layer is useful because it records how variables move through projections, filters, joins, antijoins, and recursive strata before the backend starts maintaining state.
+```mermaid
+flowchart LR
+    Rule["Datalog Rule"] --> Catalog["Rule Catalog"]
+    Catalog --> Signatures["Collection Signatures"]
+    Signatures --> Flows["Transformation Flows"]
+    Flows --> Operators["Physical Operators"]
+    Operators --> Backend["Incremental Backend State"]
+```
+
+That layer is useful because it records how variables move through projections, filters, joins, antijoins, and recursive strata before the backend
+starts maintaining state.

 ---

@ -90,6 +100,18 @@ Arc(x, y)
 -> key-value view: key=(y), value=(x)
 ```

+```mermaid
+flowchart TB
+    Arc["Arc(x, y)"]
+    Arc --> Row["Row View<br/>(x, y)"]
+    Arc --> KeyX["Key View<br/>key = x"]
+    Arc --> KvX["Key-Value View<br/>key = x<br/>value = y"]
+    Arc --> KvY["Key-Value View<br/>key = y<br/>value = x"]
+    KeyX --> Semi["Semijoin or Antijoin"]
+    KvX --> Join1["Join on x"]
+    KvY --> Join2["Join on y"]
+```
+
 This is a central planning choice. The key determines which arrangement or maintained index the backend can use.

 ---
@ -98,6 +120,24 @@ This is a central planning choice. The key determines which arrangement or maint

 FlowLog's transformations separate unary reshaping from binary combination.

+```mermaid
+flowchart TB
+    Input["Input Collection"]
+    Input --> Unary["Unary Transformation"]
+    Unary --> UnaryOut["Projected, Filtered, or Arranged Collection"]
+
+    Left["Left Collection"] --> Binary["Binary Transformation"]
+    Right["Right Collection"] --> Binary
+    Binary --> BinaryOut["Joined or Antijoined Collection"]
+
+    Unary --> RowToRow["Row to Row"]
+    Unary --> RowToKey["Row to Key"]
+    Unary --> RowToKv["Row to Key-Value"]
+    Binary --> Join["Join"]
+    Binary --> Anti["Antijoin"]
+    Binary --> Product["Cartesian Product"]
+```
+
 Unary transformations include:

 - row to row
@ -178,6 +218,12 @@ The join graph is a chain:
 A --x-- B --y-- C
 ```

+```mermaid
+flowchart LR
+    A["A(a, x)"] -- "x" --> B["B(x, y)"]
+    B -- "y" --> C["C(y, c)"]
+```
+
 A rule like this is sensitive to join order:

 ```text
@ -190,6 +236,24 @@ R(a, d) :-

 Joining `A` with `D` first is a cross product. Joining adjacent atoms first preserves bindings and reduces intermediate results.

+```mermaid
+flowchart TB
+    subgraph Good["Connected Join Order"]
+        A1["A(a, x)"] --> AB["Join on x"]
+        B1["B(x, y)"] --> AB
+        AB --> ABC["Join on y"]
+        C1["C(y, z)"] --> ABC
+        ABC --> ABCD["Join on z"]
+        D1["D(z, d)"] --> ABCD
+    end
+
+    subgraph Bad["Disconnected Join Order"]
+        A2["A(a, x)"] --> AD["Cross Product"]
+        D2["D(z, d)"] --> AD
+        AD --> Later["Later Filters and Joins"]
+    end
+```
+
 FlowLog's structural planning uses variable overlap to choose a plan tree that keeps joins connected and intermediate width smaller.

 ---
@ -215,6 +279,17 @@ FlowLog instead uses structural signals:
 - whether a candidate plan creates disconnected joins
 - how deep the plan tree becomes

+```mermaid
+flowchart LR
+    RuleBody["Rule Body"] --> Overlap["Variable Overlap"]
+    RuleBody --> Width["Intermediate Width"]
+    RuleBody --> Connectivity["Join Connectivity"]
+    Overlap --> PlanTree["Candidate Plan Tree"]
+    Width --> PlanTree
+    Connectivity --> PlanTree
+    PlanTree --> Choice["Robust Plan Choice"]
+```
+
 This is not guaranteed to be optimal. It is meant to avoid obviously bad plans.

 That is a good fit for DBSP-backed work too, because a bad plan becomes maintained operator state.
@ -247,7 +322,19 @@ bad(x, z) :-
    not D(x, z).
 ```

-The antijoin against `D(x, z)` can run after `A` and `B`; it does not need to wait for `C`. Running it earlier may reduce the input to the later join with `C`.
+The antijoin against `D(x, z)` can run after `A` and `B`; it does not need to wait for `C`. Running it earlier may reduce the input to the later join
+with `C`.
+
+```mermaid
+flowchart LR
+    A["A(x, y)"] --> AB["Join on y"]
+    B["B(y, z)"] --> AB
+    AB --> AntiD["Antijoin D(x, z)"]
+    D["D(x, z)"] --> AntiD
+    AntiD --> JoinC["Join C(z, w)"]
+    C["C(z, w)"] --> JoinC
+    JoinC --> Out["bad(x, z)"]
+```

 This is the same issue as antijoin pushdown in the DBSP CRDT note.

@ -265,13 +352,23 @@ derive useful keys
 -> join less data
 ```

+```mermaid
+flowchart LR
+    DeltaReach["Delta Reach(x)"] --> Keys["Useful x Keys"]
+    Keys --> SemiArc["Semijoin Arc on x"]
+    Arc["Arc(x, y)"] --> SemiArc
+    SemiArc --> Join["Join with Delta Reach"]
+    Join --> NewReach["New Reach(y)"]
+```
+
 Example:

 ```text
 Reach(y) :- Reach(x), Arc(x, y).
 ```

-If the current delta contains only a small set of `Reach(x)` values, then `Arc` only needs edges whose source is in that set. A semijoin can prefilter `Arc` before the recursive join.
+If the current delta contains only a small set of `Reach(x)` values, then `Arc` only needs edges whose source is in that set. A semijoin can prefilter
+`Arc` before the recursive join.

 For CRDT causal readiness, this suggests a physical plan centered on frontier operations:

@ -281,6 +378,15 @@ new ready operations
 -> newly ready operations
 ```

+```mermaid
+flowchart LR
+    Frontier["Ready Frontier"] --> CandidatePred["Pred Edges from Frontier"]
+    Pred["pred(from, to)"] --> CandidatePred
+    CandidatePred --> Check["Predecessor Checks"]
+    Check --> NewReady["New Ready Operations"]
+    NewReady --> Frontier
+```
+
 rather than a plan that repeatedly starts from roots.

 ---
@ -291,6 +397,20 @@ Recursive rules require fixed-point execution.

 FlowLog groups recursive rules into recursive strata, then executes them inside an iterative dataflow scope.

+```mermaid
+flowchart TB
+    Earlier["Earlier Strata Outputs"] --> Enter["Enter Recursive Scope"]
+    EDB["Input Relations"] --> Enter
+    Enter --> Base["Base Rules"]
+    Base --> LoopVars["IDB Loop Variables"]
+    LoopVars --> Step["Recursive Step Rules"]
+    Step --> Delta["New Derived Facts"]
+    Delta --> LoopVars
+    LoopVars --> Done{"Fixed Point?"}
+    Done -- "no" --> Step
+    Done -- "yes" --> Collect["Collect Recursive Outputs"]
+```
+
 The important design point is that a recursive stratum can contain several rules deriving related IDBs. The planner must know:

 - which IDBs are loop variables
@ -298,7 +418,8 @@ The important design point is that a recursive stratum can contain several rules
 - which outputs must be collected after convergence
 - which intermediate arrangements are useful across iterations

-For DBSP, this maps to recursive circuits with feedback and delay. The frontend still needs the same rule-level information before it can produce a good circuit.
+For DBSP, this maps to recursive circuits with feedback and delay. The frontend still needs the same rule-level information before it can produce a
+good circuit.

 ---

@ -323,6 +444,15 @@ common_antecedent(x, y)
 -> violation_b(y)
 ```

+```mermaid
+flowchart LR
+    A["A(x, y)"] --> Common["common_antecedent(x, y)"]
+    B["B(y)"] --> Common
+    Common --> Va["violation_a(x)"]
+    Common --> Vb["violation_b(y)"]
+    Extra["Extra Check"] --> Vb
+```
+
 FlowLog's explicit rule plans and collection signatures are a useful place to represent this sharing.

 ---
@ -349,6 +479,16 @@ the output of the first join may need to be arranged by `z`, not by `x`.

 That means the planner should choose output keys based on the next operation, not only the current operation.

+```mermaid
+flowchart LR
+    A["A(x, y)<br/>key = y"] --> JoinAB["Join on y"]
+    B["B(y, z)<br/>key = y"] --> JoinAB
+    JoinAB --> R["R(x, z)<br/>next key = z"]
+    R --> JoinRC["Join on z"]
+    C["C(z, w)<br/>key = z"] --> JoinRC
+    JoinRC --> S["S(x, w)"]
+```
+
 This is one reason a simple relational algebra tree is not enough. The physical plan needs key and payload annotations.

 ---
@ -379,6 +519,17 @@ distinct -> DBSP distinct
 recursion -> DBSP fixed-point circuit
 ```

+```mermaid
+flowchart LR
+    Source["Datalog or Geolog Rules"] --> Frontend["Frontend Parser or Compiler"]
+    Frontend --> Catalog["Rule Catalogs"]
+    Catalog --> Planner["FlowLog-Style Planner"]
+    Planner --> IR["Relational IR with Keys"]
+    IR --> Lowering["DBSP Lowering"]
+    Lowering --> Circuit["DBSP Circuit"]
+    Circuit --> Deltas["Maintained Output Deltas"]
+```
+
 The key point is that DBSP should receive an already planned circuit, not raw Datalog text.

 ---
--- a/flowlog/005-using-flowlog-ideas.md
+++ b/flowlog/005-using-flowlog-ideas.md
@ -16,7 +16,16 @@ Level 2: borrow FlowLog planning ideas for a DBSP frontend
 Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs
 ```

-The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer before DBSP.
+```mermaid
+flowchart TB
+    L1["Level 1<br/>Run FlowLog Examples"] --> L2["Level 2<br/>Borrow Planning Ideas"]
+    L2 --> L3["Level 3<br/>Backend Comparison"]
+    L2 --> DBSP["DBSP Frontend Work"]
+    L3 --> Decision["Backend and Planner Decisions"]
+```
+
+The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer
+before DBSP.

 ---

@ -31,7 +40,8 @@ That would conflate two separate questions:

 The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer.

-The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.
+The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source
+concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.

 ---

@ -70,7 +80,28 @@ insert tree
 -> next visible element
 ```

-These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and adding semijoin filters around the current frontier.
+These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and
+adding semijoin filters around the current frontier.
+
+```mermaid
+flowchart TB
+    subgraph Causal["Causal Readiness"]
+        Pred["pred Graph"] --> Roots["Roots"]
+        Roots --> Ready["Ready Operations"]
+        Ready --> Frontier["Frontier"]
+        Frontier --> NewPred["Outgoing Pred Edges"]
+        NewPred --> Ready
+    end
+
+    subgraph List["List Traversal"]
+        Insert["insert Tree"] --> First["firstChild"]
+        Insert --> Sibling["nextSibling"]
+        First --> Next["nextElem"]
+        Sibling --> Next
+        Remove["remove Tombstones"] --> Visible["nextVisible"]
+        Next --> Visible
+    end
+```

 The concrete experiment:

@ -116,6 +147,16 @@ FlowLog-style catalogs would help the compiler answer:
 - which projected values are needed for the violation row
 - whether two laws share a common antecedent

+```mermaid
+flowchart LR
+    Law["Geomerge Law"] --> Rule["Datalog-Like Rule"]
+    Rule --> Catalog["Rule Catalog"]
+    Catalog --> JoinGraph["Join Graph"]
+    JoinGraph --> Plan["Planned Relational Tree"]
+    Plan --> Violation["Violation Relation"]
+    Violation --> DBSP["DBSP Maintained Output"]
+```
+
 The concrete experiment:

 ```text
@ -166,7 +207,20 @@ The comparison should measure:
 - output delta size
 - ease of rollback or preview execution

-This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid batch-plus-incremental strategy is needed.
+```mermaid
+flowchart TB
+    Program["Same Datalog Program"] --> IR["Shared Relational IR"]
+    Facts["Same Input Facts"] --> IR
+    IR --> DBSP["DBSP Lowering"]
+    IR --> DD["Differential Dataflow Lowering"]
+    DBSP --> DbspMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
+    DD --> DdMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
+    DbspMetrics --> Compare["Backend Comparison"]
+    DdMetrics --> Compare
+```
+
+This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid
+batch-plus-incremental strategy is needed.

 ---

@ -215,6 +269,18 @@ Datalog-like rule text
 -> planned relational tree
 ```

+```mermaid
+flowchart LR
+    Text["Rule Text"] --> Parse["Parsed Rules"]
+    Parse --> Deps["Dependency Graph"]
+    Deps --> Strata["Strata"]
+    Parse --> Catalog["Rule Catalog"]
+    Catalog --> JoinGraph["Join Graph"]
+    Strata --> Plan["Planned Tree"]
+    JoinGraph --> Plan
+    Plan --> Explain["Textual Plan Explanation"]
+```
+
 It does not need to run DBSP at first.

 The output can be textual:
@ -253,6 +319,17 @@ This prototype would validate the compiler shape before depending on a backend A

 The second prototype should lower a narrow subset to DBSP.

+```mermaid
+flowchart TB
+    Subset["Supported Rule Subset"] --> Planner["Planner"]
+    Planner --> IR["Relational IR"]
+    IR --> Lowering["DBSP Lowering"]
+    Lowering --> Runtime["DBSP Runtime"]
+    Runtime --> Output["Maintained Outputs"]
+    Snapshot["Naive Snapshot Evaluator"] --> Oracle["Correctness Oracle"]
+    Output --> Oracle
+```
+
 Supported subset:

 - relation declarations
@ -292,15 +369,29 @@ planned rules -> DBSP-maintained outputs

 Several decisions should be made explicitly before implementation.

-**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when `distinct` is applied.
+```mermaid
+flowchart TB
+    Decisions["Data Model Decisions"]
+    Decisions --> Semantics["Set or Multiset Semantics"]
+    Decisions --> Identity["Operation Identity"]
+    Decisions --> Violations["Violation Row Shape"]
+    Decisions --> Integration["Output Integration"]
+    Decisions --> Rollback["Rollback or Preview"]
+```

-**Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical key with two physical fields.
+**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when
+`distinct` is applied.
+
+**Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical
+key with two physical fields.

 **Violation Rows**: Geomerge violations should include enough context for error messages, not just a boolean.

-**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that integration.
+**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that
+integration.

-**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay transactionally coupled to storage.
+**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay
+transactionally coupled to storage.

 ---

@ -308,6 +399,19 @@ Several decisions should be made explicitly before implementation.

 The evaluation should separate correctness from performance.

+```mermaid
+flowchart LR
+    Inputs["Input Facts and Updates"] --> Naive["Naive Snapshot Evaluation"]
+    Inputs --> Planned["Planned Backend Evaluation"]
+    Naive --> Correctness["Correctness Check"]
+    Planned --> Correctness
+    Planned --> Perf["Performance Metrics"]
+    Perf --> Hydration["Hydration"]
+    Perf --> Warm["Warm Updates"]
+    Perf --> Memory["Memory"]
+    Perf --> Sensitivity["History and Join Sensitivity"]
+```
+
 Correctness checks:

 ```text
@ -351,7 +455,8 @@ The main decision points are:
 - whether to persist backend operator state
 - whether to compare against Differential Dataflow for recursive workloads

-These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does not force Differential Dataflow as the backend.
+These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does
+not force Differential Dataflow as the backend.

 ---

@ -363,6 +468,14 @@ The next step is lowering a small subset to DBSP.

 After that, FlowLog itself can serve as a comparison backend for the same small programs.

+```mermaid
+flowchart LR
+    P1["Planning-Only Compiler"] --> P2["DBSP Subset Lowering"]
+    P2 --> P3["FlowLog Backend Comparison"]
+    P3 --> P4["Shared IR Decision"]
+    P4 --> P5["Production-Oriented Prototype"]
+```
+
 The goal should be:

 ```text