Add mermaid diagrams to the note files

2026-05-20 15:54:55 +02:00 · 2026-05-20 15:54:55 +02:00 · 2bfcb7e818
commit 2bfcb7e818
parent 99c30190a8
3 changed files with 293 additions and 20 deletions
--- a/flowlog/003-flowlog-and-dbsp-synergy.md
+++ b/flowlog/003-flowlog-and-dbsp-synergy.md
@ -107,7 +107,8 @@ mvrStore(Key, Value) :-
    not overwritten(RepId, Ctr).
 ```
-This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation history.
+This is a favorable DBSP workload. The backend can maintain the antijoin state and process small updates without rescanning the full operation
 history.
 The causal-readiness query is harder:
@ -122,9 +123,13 @@ isCausallyReady(RepId, Ctr) :-
 This is recursive graph traversal. The DBSP CRDT notes report that this query can remain dependent on causal-history depth, even during warm updates.
-FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings. For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots through the whole causal graph.
+FlowLog's planning ideas are relevant here. Sideways information passing suggests prefiltering recursive traversal through known relevant bindings.
 For CRDTs, that could mean using current heads, leaves, or newly arrived operations as a frontier instead of repeatedly deriving readiness from roots
 through the whole causal graph.
-The list CRDT query is also planning-sensitive. Relations such as `firstChild`, `nextSibling`, `nextSiblingAnc`, `nextElem`, and `nextVisible` create several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP builds maintained operator state.
+The list CRDT query is also planning-sensitive. Relations such as `firstChild`, `nextSibling`, `nextSiblingAnc`, `nextElem`, and `nextVisible` create
 several joins, antijoins, and recursive steps. FlowLog-style rule catalogs and join planning would help choose better intermediate shapes before DBSP
 builds maintained operator state.
 ---
@ -195,7 +200,8 @@ For Geomerge, the source may be compiled Geolog laws.
 For execution, the backend may be DBSP, Differential Dataflow, or a non-incremental batch engine.
-This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been checked, stratified, and optimized.
+This matches the existing DBSP notes: DBSP should not own full source-language semantics. It should receive a relational plan that has already been
 checked, stratified, and optimized.
 ---
@ -203,7 +209,8 @@ This matches the existing DBSP notes: DBSP should not own full source-language s
 FlowLog suggests several optimizations that transfer well to DBSP-backed work.
-**Structural Planning**: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates are weak.
+**Structural Planning**: Choose join trees from variable overlap and intermediate width, especially for recursive rules where cardinality estimates
 are weak.
 **Sideways Information Passing**: Add semijoin-style filters so later joins and recursive steps see fewer irrelevant tuples.
@ -211,7 +218,8 @@ FlowLog suggests several optimizations that transfer well to DBSP-backed work.
 **Subplan Sharing**: Reuse common derived relations across laws or CRDT views when multiple outputs need the same intermediate facts.
-**Physical Key Choice**: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and arrangement choices will become runtime costs.
+**Physical Key Choice**: Pick key fields deliberately before lowering to the backend. DBSP joins also need maintained state, so bad key and
 arrangement choices will become runtime costs.
 ---
@ -230,7 +238,8 @@ The shared lesson is more important than the difference:
 incremental backends maintain operator state
 ```
-That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase memory use and update cost for the lifetime of the maintained query.
+That means bad plans become persistent state, not just one bad query execution. A poor join order or unnecessary intermediate relation can increase
 memory use and update cost for the lifetime of the maintained query.
 FlowLog is useful because it treats planning as a first-class layer before execution.
--- a/flowlog/004-flowlog-technical-planning-notes.md
+++ b/flowlog/004-flowlog-technical-planning-notes.md
@ -18,7 +18,17 @@ rule atoms and variables
 -> physical operator choices
 ```
-That layer is useful because it records how variables move through projections, filters, joins, antijoins, and recursive strata before the backend starts maintaining state.
+```mermaid
 flowchart LR
    Rule["Datalog Rule"] --> Catalog["Rule Catalog"]
    Catalog --> Signatures["Collection Signatures"]
    Signatures --> Flows["Transformation Flows"]
    Flows --> Operators["Physical Operators"]
    Operators --> Backend["Incremental Backend State"]
 ```
 That layer is useful because it records how variables move through projections, filters, joins, antijoins, and recursive strata before the backend
 starts maintaining state.
 ---
@ -90,6 +100,18 @@ Arc(x, y)
 -> key-value view: key=(y), value=(x)
 ```
 ```mermaid
 flowchart TB
    Arc["Arc(x, y)"]
    Arc --> Row["Row View<br/>(x, y)"]
    Arc --> KeyX["Key View<br/>key = x"]
    Arc --> KvX["Key-Value View<br/>key = x<br/>value = y"]
    Arc --> KvY["Key-Value View<br/>key = y<br/>value = x"]
    KeyX --> Semi["Semijoin or Antijoin"]
    KvX --> Join1["Join on x"]
    KvY --> Join2["Join on y"]
 ```
 This is a central planning choice. The key determines which arrangement or maintained index the backend can use.
 ---
@ -98,6 +120,24 @@ This is a central planning choice. The key determines which arrangement or maint
 FlowLog's transformations separate unary reshaping from binary combination.
 ```mermaid
 flowchart TB
    Input["Input Collection"]
    Input --> Unary["Unary Transformation"]
    Unary --> UnaryOut["Projected, Filtered, or Arranged Collection"]
    Left["Left Collection"] --> Binary["Binary Transformation"]
    Right["Right Collection"] --> Binary
    Binary --> BinaryOut["Joined or Antijoined Collection"]
    Unary --> RowToRow["Row to Row"]
    Unary --> RowToKey["Row to Key"]
    Unary --> RowToKv["Row to Key-Value"]
    Binary --> Join["Join"]
    Binary --> Anti["Antijoin"]
    Binary --> Product["Cartesian Product"]
 ```
 Unary transformations include:
 - row to row
@ -178,6 +218,12 @@ The join graph is a chain:
 A --x-- B --y-- C
 ```
 ```mermaid
 flowchart LR
    A["A(a, x)"] -- "x" --> B["B(x, y)"]
    B -- "y" --> C["C(y, c)"]
 ```
 A rule like this is sensitive to join order:
 ```text
@ -190,6 +236,24 @@ R(a, d) :-
 Joining `A` with `D` first is a cross product. Joining adjacent atoms first preserves bindings and reduces intermediate results.
 ```mermaid
 flowchart TB
    subgraph Good["Connected Join Order"]
        A1["A(a, x)"] --> AB["Join on x"]
        B1["B(x, y)"] --> AB
        AB --> ABC["Join on y"]
        C1["C(y, z)"] --> ABC
        ABC --> ABCD["Join on z"]
        D1["D(z, d)"] --> ABCD
    end
    subgraph Bad["Disconnected Join Order"]
        A2["A(a, x)"] --> AD["Cross Product"]
        D2["D(z, d)"] --> AD
        AD --> Later["Later Filters and Joins"]
    end
 ```
 FlowLog's structural planning uses variable overlap to choose a plan tree that keeps joins connected and intermediate width smaller.
 ---
@ -215,6 +279,17 @@ FlowLog instead uses structural signals:
 - whether a candidate plan creates disconnected joins
 - how deep the plan tree becomes
 ```mermaid
 flowchart LR
    RuleBody["Rule Body"] --> Overlap["Variable Overlap"]
    RuleBody --> Width["Intermediate Width"]
    RuleBody --> Connectivity["Join Connectivity"]
    Overlap --> PlanTree["Candidate Plan Tree"]
    Width --> PlanTree
    Connectivity --> PlanTree
    PlanTree --> Choice["Robust Plan Choice"]
 ```
 This is not guaranteed to be optimal. It is meant to avoid obviously bad plans.
 That is a good fit for DBSP-backed work too, because a bad plan becomes maintained operator state.
@ -247,7 +322,19 @@ bad(x, z) :-
    not D(x, z).
 ```
-The antijoin against `D(x, z)` can run after `A` and `B`; it does not need to wait for `C`. Running it earlier may reduce the input to the later join with `C`.
+The antijoin against `D(x, z)` can run after `A` and `B`; it does not need to wait for `C`. Running it earlier may reduce the input to the later join
 with `C`.
 ```mermaid
 flowchart LR
    A["A(x, y)"] --> AB["Join on y"]
    B["B(y, z)"] --> AB
    AB --> AntiD["Antijoin D(x, z)"]
    D["D(x, z)"] --> AntiD
    AntiD --> JoinC["Join C(z, w)"]
    C["C(z, w)"] --> JoinC
    JoinC --> Out["bad(x, z)"]
 ```
 This is the same issue as antijoin pushdown in the DBSP CRDT note.
@ -265,13 +352,23 @@ derive useful keys
 -> join less data
 ```
 ```mermaid
 flowchart LR
    DeltaReach["Delta Reach(x)"] --> Keys["Useful x Keys"]
    Keys --> SemiArc["Semijoin Arc on x"]
    Arc["Arc(x, y)"] --> SemiArc
    SemiArc --> Join["Join with Delta Reach"]
    Join --> NewReach["New Reach(y)"]
 ```
 Example:
 ```text
 Reach(y) :- Reach(x), Arc(x, y).
 ```
-If the current delta contains only a small set of `Reach(x)` values, then `Arc` only needs edges whose source is in that set. A semijoin can prefilter `Arc` before the recursive join.
+If the current delta contains only a small set of `Reach(x)` values, then `Arc` only needs edges whose source is in that set. A semijoin can prefilter
 `Arc` before the recursive join.
 For CRDT causal readiness, this suggests a physical plan centered on frontier operations:
@ -281,6 +378,15 @@ new ready operations
 -> newly ready operations
 ```
 ```mermaid
 flowchart LR
    Frontier["Ready Frontier"] --> CandidatePred["Pred Edges from Frontier"]
    Pred["pred(from, to)"] --> CandidatePred
    CandidatePred --> Check["Predecessor Checks"]
    Check --> NewReady["New Ready Operations"]
    NewReady --> Frontier
 ```
 rather than a plan that repeatedly starts from roots.
 ---
@ -291,6 +397,20 @@ Recursive rules require fixed-point execution.
 FlowLog groups recursive rules into recursive strata, then executes them inside an iterative dataflow scope.
 ```mermaid
 flowchart TB
    Earlier["Earlier Strata Outputs"] --> Enter["Enter Recursive Scope"]
    EDB["Input Relations"] --> Enter
    Enter --> Base["Base Rules"]
    Base --> LoopVars["IDB Loop Variables"]
    LoopVars --> Step["Recursive Step Rules"]
    Step --> Delta["New Derived Facts"]
    Delta --> LoopVars
    LoopVars --> Done{"Fixed Point?"}
    Done -- "no" --> Step
    Done -- "yes" --> Collect["Collect Recursive Outputs"]
 ```
 The important design point is that a recursive stratum can contain several rules deriving related IDBs. The planner must know:
 - which IDBs are loop variables
@ -298,7 +418,8 @@ The important design point is that a recursive stratum can contain several rules
 - which outputs must be collected after convergence
 - which intermediate arrangements are useful across iterations
-For DBSP, this maps to recursive circuits with feedback and delay. The frontend still needs the same rule-level information before it can produce a good circuit.
+For DBSP, this maps to recursive circuits with feedback and delay. The frontend still needs the same rule-level information before it can produce a
 good circuit.
 ---
@ -323,6 +444,15 @@ common_antecedent(x, y)
 -> violation_b(y)
 ```
 ```mermaid
 flowchart LR
    A["A(x, y)"] --> Common["common_antecedent(x, y)"]
    B["B(y)"] --> Common
    Common --> Va["violation_a(x)"]
    Common --> Vb["violation_b(y)"]
    Extra["Extra Check"] --> Vb
 ```
 FlowLog's explicit rule plans and collection signatures are a useful place to represent this sharing.
 ---
@ -349,6 +479,16 @@ the output of the first join may need to be arranged by `z`, not by `x`.
 That means the planner should choose output keys based on the next operation, not only the current operation.
 ```mermaid
 flowchart LR
    A["A(x, y)<br/>key = y"] --> JoinAB["Join on y"]
    B["B(y, z)<br/>key = y"] --> JoinAB
    JoinAB --> R["R(x, z)<br/>next key = z"]
    R --> JoinRC["Join on z"]
    C["C(z, w)<br/>key = z"] --> JoinRC
    JoinRC --> S["S(x, w)"]
 ```
 This is one reason a simple relational algebra tree is not enough. The physical plan needs key and payload annotations.
 ---
@ -379,6 +519,17 @@ distinct -> DBSP distinct
 recursion -> DBSP fixed-point circuit
 ```
 ```mermaid
 flowchart LR
    Source["Datalog or Geolog Rules"] --> Frontend["Frontend Parser or Compiler"]
    Frontend --> Catalog["Rule Catalogs"]
    Catalog --> Planner["FlowLog-Style Planner"]
    Planner --> IR["Relational IR with Keys"]
    IR --> Lowering["DBSP Lowering"]
    Lowering --> Circuit["DBSP Circuit"]
    Circuit --> Deltas["Maintained Output Deltas"]
 ```
 The key point is that DBSP should receive an already planned circuit, not raw Datalog text.
 ---
--- a/flowlog/005-using-flowlog-ideas.md
+++ b/flowlog/005-using-flowlog-ideas.md
@ -16,7 +16,16 @@ Level 2: borrow FlowLog planning ideas for a DBSP frontend
 Level 3: compare DBSP and Differential Dataflow backends on the same Datalog programs
 ```
-The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer before DBSP.
+```mermaid
 flowchart TB
    L1["Level 1<br/>Run FlowLog Examples"] --> L2["Level 2<br/>Borrow Planning Ideas"]
    L2 --> L3["Level 3<br/>Backend Comparison"]
    L2 --> DBSP["DBSP Frontend Work"]
    L3 --> Decision["Backend and Planner Decisions"]
 ```
 The practical near-term path is Level 2. Use FlowLog's catalog, join planning, antijoin scheduling, and SIP ideas to design a better compiler layer
 before DBSP.
 ---
@ -31,7 +40,8 @@ That would conflate two separate questions:
 The DBSP notes are already about DBSP as a formal view-maintenance backend. FlowLog is more useful as a guide for the missing frontend and optimizer.
-The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.
+The first step should not be adopting FlowLog's syntax as the durable source language either. Geomerge and Geolog already have their own source
 concepts. Datalog should be an intermediate or testing language unless the user-facing language decision is explicit.
 ---
@ -70,7 +80,28 @@ insert tree
 -> next visible element
 ```
-These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and adding semijoin filters around the current frontier.
+These queries contain several recursive or join-heavy rules. FlowLog-style planning can help by choosing join keys, pushing antijoins earlier, and
 adding semijoin filters around the current frontier.
 ```mermaid
 flowchart TB
    subgraph Causal["Causal Readiness"]
        Pred["pred Graph"] --> Roots["Roots"]
        Roots --> Ready["Ready Operations"]
        Ready --> Frontier["Frontier"]
        Frontier --> NewPred["Outgoing Pred Edges"]
        NewPred --> Ready
    end
    subgraph List["List Traversal"]
        Insert["insert Tree"] --> First["firstChild"]
        Insert --> Sibling["nextSibling"]
        First --> Next["nextElem"]
        Sibling --> Next
        Remove["remove Tombstones"] --> Visible["nextVisible"]
        Next --> Visible
    end
 ```
 The concrete experiment:
@ -116,6 +147,16 @@ FlowLog-style catalogs would help the compiler answer:
 - which projected values are needed for the violation row
 - whether two laws share a common antecedent
 ```mermaid
 flowchart LR
    Law["Geomerge Law"] --> Rule["Datalog-Like Rule"]
    Rule --> Catalog["Rule Catalog"]
    Catalog --> JoinGraph["Join Graph"]
    JoinGraph --> Plan["Planned Relational Tree"]
    Plan --> Violation["Violation Relation"]
    Violation --> DBSP["DBSP Maintained Output"]
 ```
 The concrete experiment:
 ```text
@ -166,7 +207,20 @@ The comparison should measure:
 - output delta size
 - ease of rollback or preview execution
-This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid batch-plus-incremental strategy is needed.
+```mermaid
 flowchart TB
    Program["Same Datalog Program"] --> IR["Shared Relational IR"]
    Facts["Same Input Facts"] --> IR
    IR --> DBSP["DBSP Lowering"]
    IR --> DD["Differential Dataflow Lowering"]
    DBSP --> DbspMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
    DD --> DdMetrics["Hydration<br/>Warm Updates<br/>Memory<br/>Deltas"]
    DbspMetrics --> Compare["Backend Comparison"]
    DdMetrics --> Compare
 ```
 This helps decide whether DBSP needs FlowLog-like planning, whether Differential Dataflow is better for some recursive workloads, or whether a hybrid
 batch-plus-incremental strategy is needed.
 ---
@ -215,6 +269,18 @@ Datalog-like rule text
 -> planned relational tree
 ```
 ```mermaid
 flowchart LR
    Text["Rule Text"] --> Parse["Parsed Rules"]
    Parse --> Deps["Dependency Graph"]
    Deps --> Strata["Strata"]
    Parse --> Catalog["Rule Catalog"]
    Catalog --> JoinGraph["Join Graph"]
    Strata --> Plan["Planned Tree"]
    JoinGraph --> Plan
    Plan --> Explain["Textual Plan Explanation"]
 ```
 It does not need to run DBSP at first.
 The output can be textual:
@ -253,6 +319,17 @@ This prototype would validate the compiler shape before depending on a backend A
 The second prototype should lower a narrow subset to DBSP.
 ```mermaid
 flowchart TB
    Subset["Supported Rule Subset"] --> Planner["Planner"]
    Planner --> IR["Relational IR"]
    IR --> Lowering["DBSP Lowering"]
    Lowering --> Runtime["DBSP Runtime"]
    Runtime --> Output["Maintained Outputs"]
    Snapshot["Naive Snapshot Evaluator"] --> Oracle["Correctness Oracle"]
    Output --> Oracle
 ```
 Supported subset:
 - relation declarations
@ -292,15 +369,29 @@ planned rules -> DBSP-maintained outputs
 Several decisions should be made explicitly before implementation.
-**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when `distinct` is applied.
+```mermaid
 flowchart TB
    Decisions["Data Model Decisions"]
    Decisions --> Semantics["Set or Multiset Semantics"]
    Decisions --> Identity["Operation Identity"]
    Decisions --> Violations["Violation Row Shape"]
    Decisions --> Integration["Output Integration"]
    Decisions --> Rollback["Rollback or Preview"]
 ```
-**Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical key with two physical fields.
+**Set or Multiset Semantics**: CRDT operation facts are usually set-like. DBSP uses Z-set weights internally. The frontend should define when
 `distinct` is applied.
 **Operation Identity**: CRDT examples use `(replica_id, counter)`. The planner should treat this pair either as two scalar fields or as one logical
 key with two physical fields.
 **Violation Rows**: Geomerge violations should include enough context for error messages, not just a boolean.
-**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that integration.
+**Output Integration**: DBSP emits deltas. Applications often need an integrated current view. The runtime boundary should say who owns that
 integration.
-**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay transactionally coupled to storage.
+**Rollback**: Geomerge validation needs preview or rollback behavior. If using weighted deltas, inverse deltas are plausible but must stay
 transactionally coupled to storage.
 ---
@ -308,6 +399,19 @@ Several decisions should be made explicitly before implementation.
 The evaluation should separate correctness from performance.
 ```mermaid
 flowchart LR
    Inputs["Input Facts and Updates"] --> Naive["Naive Snapshot Evaluation"]
    Inputs --> Planned["Planned Backend Evaluation"]
    Naive --> Correctness["Correctness Check"]
    Planned --> Correctness
    Planned --> Perf["Performance Metrics"]
    Perf --> Hydration["Hydration"]
    Perf --> Warm["Warm Updates"]
    Perf --> Memory["Memory"]
    Perf --> Sensitivity["History and Join Sensitivity"]
 ```
 Correctness checks:
 ```text
@ -351,7 +455,8 @@ The main decision points are:
 - whether to persist backend operator state
 - whether to compare against Differential Dataflow for recursive workloads
-These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does not force Differential Dataflow as the backend.
+These decisions should stay separate. Choosing DBSP as the backend does not force a particular Datalog syntax. Choosing a FlowLog-like planner does
 not force Differential Dataflow as the backend.
 ---
@ -363,6 +468,14 @@ The next step is lowering a small subset to DBSP.
 After that, FlowLog itself can serve as a comparison backend for the same small programs.
 ```mermaid
 flowchart LR
    P1["Planning-Only Compiler"] --> P2["DBSP Subset Lowering"]
    P2 --> P3["FlowLog Backend Comparison"]
    P3 --> P4["Shared IR Decision"]
    P4 --> P5["Production-Oriented Prototype"]
 ```
 The goal should be:
 ```text