Add a summary note file for the notes taken so far
This commit is contained in:
parent
2bfcb7e818
commit
cf4c522ff3
426
flowlog/006-flowlog-synthesis.md
Normal file
426
flowlog/006-flowlog-synthesis.md
Normal file
@ -0,0 +1,426 @@
|
|||||||
|
# FlowLog Synthesis
|
||||||
|
|
||||||
|
A unifying note for the FlowLog primer, implementation notes, DBSP synergy notes, technical planning notes, and usage plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Short Answer
|
||||||
|
|
||||||
|
The five FlowLog notes make one argument:
|
||||||
|
|
||||||
|
```text
|
||||||
|
FlowLog is most useful here as a model for the Datalog planning layer that should sit before an incremental backend such as DBSP.
|
||||||
|
```
|
||||||
|
|
||||||
|
The core architecture is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Datalog or Geolog-shaped rules
|
||||||
|
-> dependency analysis and strata
|
||||||
|
-> rule catalog
|
||||||
|
-> join graph and relational plan
|
||||||
|
-> FlowLog-style optimization
|
||||||
|
-> DBSP or Differential Dataflow backend
|
||||||
|
-> maintained outputs
|
||||||
|
```
|
||||||
|
|
||||||
|
FlowLog is not only an engine to run. It is a concrete example of how to keep rule semantics, planning, optimization, and backend execution separated.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
Source["Datalog or Geolog Rules"] --> Strata["Dependency Analysis and Strata"]
|
||||||
|
Strata --> Catalog["Rule Catalog"]
|
||||||
|
Catalog --> Plan["Relational Plan"]
|
||||||
|
Plan --> Optimize["FlowLog-Style Optimization"]
|
||||||
|
Optimize --> IR["Backend-Neutral IR"]
|
||||||
|
IR --> DBSP["DBSP Backend"]
|
||||||
|
IR --> DD["Differential Dataflow Backend"]
|
||||||
|
DBSP --> Outputs["Maintained Outputs"]
|
||||||
|
DD --> Outputs
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How the Notes Fit Together
|
||||||
|
|
||||||
|
The first note, `001-flowlog-primer.md`, explains the concept. FlowLog is a Datalog engine that uses Differential Dataflow as its execution backend,
|
||||||
|
while keeping Datalog-specific planning visible before lowering to dataflow operators.
|
||||||
|
|
||||||
|
The second note, `002-flowlog-implementation.md`, explains the artifact structure. The useful implementation shape is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
parsing
|
||||||
|
-> strata
|
||||||
|
-> catalog
|
||||||
|
-> planning
|
||||||
|
-> optimizing
|
||||||
|
-> executing
|
||||||
|
```
|
||||||
|
|
||||||
|
The third note, `003-flowlog-and-dbsp-synergy.md`, maps FlowLog to the DBSP notes. DBSP answers how to maintain relational results over changing
|
||||||
|
inputs. FlowLog helps answer what relational plan should be maintained.
|
||||||
|
|
||||||
|
The fourth note, `004-flowlog-technical-planning-notes.md`, zooms into the planning details: rule catalogs, collection shapes, transformation flows,
|
||||||
|
join graphs, antijoin timing, SIP, recursive strata, subplan sharing, and physical key choice.
|
||||||
|
|
||||||
|
The fifth note, `005-using-flowlog-ideas.md`, turns the ideas into a practical adoption path: planning-only prototype, DBSP lowering prototype,
|
||||||
|
backend comparison, test corpus, data model decisions, and evaluation plan.
|
||||||
|
|
||||||
|
Together, the notes move from:
|
||||||
|
|
||||||
|
```text
|
||||||
|
what FlowLog is
|
||||||
|
-> how it is built
|
||||||
|
-> why it matters for DBSP
|
||||||
|
-> which technical pieces transfer
|
||||||
|
-> how to use those pieces
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Unified Mental Model
|
||||||
|
|
||||||
|
The shared mental model is that Datalog execution has three separate layers.
|
||||||
|
|
||||||
|
The source layer owns user-facing or system-facing rules:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Datalog programs
|
||||||
|
Geolog laws
|
||||||
|
CRDT definitions
|
||||||
|
```
|
||||||
|
|
||||||
|
The planning layer owns the logical and physical shape of evaluation:
|
||||||
|
|
||||||
|
```text
|
||||||
|
strata
|
||||||
|
rule catalogs
|
||||||
|
join graphs
|
||||||
|
antijoin placement
|
||||||
|
SIP filters
|
||||||
|
physical keys
|
||||||
|
shared subplans
|
||||||
|
```
|
||||||
|
|
||||||
|
The backend layer owns maintained computation:
|
||||||
|
|
||||||
|
```text
|
||||||
|
DBSP circuits
|
||||||
|
Differential Dataflow dataflows
|
||||||
|
batch evaluators
|
||||||
|
```
|
||||||
|
|
||||||
|
The main design rule is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Backend execution should not rediscover rule semantics.
|
||||||
|
```
|
||||||
|
|
||||||
|
The backend should receive a checked, stratified, and optimized relational plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## FlowLog's Transferable Pieces
|
||||||
|
|
||||||
|
The most transferable pieces are not tied to Differential Dataflow.
|
||||||
|
|
||||||
|
**Rule Catalog**: A structured summary of each rule's atoms, variables, constants, comparisons, negations, and output projection.
|
||||||
|
|
||||||
|
**Stratification**: A dependency order for non-recursive and recursive rule groups, with negation restrictions kept explicit.
|
||||||
|
|
||||||
|
**Join Graph**: A graph or hypergraph of atoms connected by shared variables.
|
||||||
|
|
||||||
|
**Structural Planning**: A robust join-ordering strategy based on variable overlap, intermediate width, and join connectivity.
|
||||||
|
|
||||||
|
**Sideways Information Passing**: Semijoin-style filtering that uses known bindings to reduce later joins.
|
||||||
|
|
||||||
|
**Antijoin Scheduling**: Placement of negated atoms as soon as their variables are bound.
|
||||||
|
|
||||||
|
**Physical Key Choice**: Deliberate selection of keys and payload fields for maintained joins and arrangements.
|
||||||
|
|
||||||
|
**Subplan Sharing**: Reuse of common antecedents or intermediate relations across rules.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
Catalog["Rule Catalog"] --> JoinGraph["Join Graph"]
|
||||||
|
Catalog --> Negation["Negation and Filters"]
|
||||||
|
JoinGraph --> Structural["Structural Planning"]
|
||||||
|
Negation --> Antijoin["Antijoin Scheduling"]
|
||||||
|
Catalog --> SIP["Sideways Information Passing"]
|
||||||
|
Structural --> Keys["Physical Key Choice"]
|
||||||
|
SIP --> Keys
|
||||||
|
Antijoin --> Keys
|
||||||
|
Keys --> IR["Optimized Relational IR"]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DBSP Connection
|
||||||
|
|
||||||
|
The DBSP notes focus on incremental maintenance:
|
||||||
|
|
||||||
|
```text
|
||||||
|
input deltas
|
||||||
|
-> maintained operator state
|
||||||
|
-> output deltas
|
||||||
|
```
|
||||||
|
|
||||||
|
DBSP gives the algebra and runtime model for maintained relational computation. It does not by itself solve source-language compilation,
|
||||||
|
Datalog-specific optimization, Geolog law translation, CRDT-specific planning, or user-facing diagnostics.
|
||||||
|
|
||||||
|
FlowLog's planning layer fits before DBSP:
|
||||||
|
|
||||||
|
```text
|
||||||
|
rules
|
||||||
|
-> FlowLog-like planner
|
||||||
|
-> DBSP circuit
|
||||||
|
```
|
||||||
|
|
||||||
|
This division is important because a poor plan is not just a bad one-shot query. In an incremental system, a poor plan becomes persistent maintained
|
||||||
|
state. Bad join order, unnecessary intermediate fields, and late antijoins can increase memory and update cost for the life of the circuit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CRDT Connection
|
||||||
|
|
||||||
|
The CRDT notes use Datalog to define visible state over immutable operation facts.
|
||||||
|
|
||||||
|
Simple register queries are already a good fit:
|
||||||
|
|
||||||
|
```text
|
||||||
|
set + pred
|
||||||
|
-> overwritten
|
||||||
|
-> visible values
|
||||||
|
```
|
||||||
|
|
||||||
|
The harder cases are recursive and structural:
|
||||||
|
|
||||||
|
```text
|
||||||
|
causal readiness
|
||||||
|
list traversal
|
||||||
|
tombstone skipping
|
||||||
|
move-like tree operations
|
||||||
|
```
|
||||||
|
|
||||||
|
FlowLog helps by making the expensive parts explicit:
|
||||||
|
|
||||||
|
- causal-readiness recursion should be planned around frontiers when possible
|
||||||
|
- list traversal should avoid carrying unnecessary fields through every intermediate
|
||||||
|
- antijoins for tombstones should run as soon as their keys are available
|
||||||
|
- repeated list subqueries should share intermediate relations where possible
|
||||||
|
|
||||||
|
The practical CRDT target is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
same CRDT rules
|
||||||
|
-> naive plan
|
||||||
|
-> FlowLog-style plan
|
||||||
|
-> DBSP-maintained result
|
||||||
|
-> hydration and warm-update comparison
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Geomerge Connection
|
||||||
|
|
||||||
|
The Geomerge notes propose compiling supported laws into maintained violation relations.
|
||||||
|
|
||||||
|
The simplest useful form is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
required_consequent(x) :- antecedent(x).
|
||||||
|
violation(x) :- required_consequent(x), not consequent(x).
|
||||||
|
```
|
||||||
|
|
||||||
|
FlowLog helps once antecedents become multi-atom joins:
|
||||||
|
|
||||||
|
```text
|
||||||
|
violation(x, y, z) :-
|
||||||
|
A(x, y),
|
||||||
|
B(y, z),
|
||||||
|
C(z),
|
||||||
|
not D(x, z).
|
||||||
|
```
|
||||||
|
|
||||||
|
At that point, a compiler needs the same machinery FlowLog demonstrates:
|
||||||
|
|
||||||
|
- variable occurrence maps
|
||||||
|
- join graph extraction
|
||||||
|
- antijoin scheduling
|
||||||
|
- projection minimization
|
||||||
|
- shared antecedent detection
|
||||||
|
- violation-row construction
|
||||||
|
|
||||||
|
The practical Geomerge target is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
FlatTheory laws
|
||||||
|
-> supported relational subset
|
||||||
|
-> rule catalog
|
||||||
|
-> planned violation query
|
||||||
|
-> DBSP-maintained violations relation
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Architecture
|
||||||
|
|
||||||
|
The recommended architecture has a backend-neutral middle layer.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
subgraph Sources["Source Layers"]
|
||||||
|
Datalog["Datalog CRDT Rules"]
|
||||||
|
Geolog["Compiled Geolog Laws"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph Planner["FlowLog-Inspired Planner"]
|
||||||
|
Parse["Parse or Translate"]
|
||||||
|
Strata["Stratify"]
|
||||||
|
Catalog["Catalog Rules"]
|
||||||
|
Graph["Join Graph Construction"]
|
||||||
|
Optimize["Plan Joins, Antijoins, and SIP"]
|
||||||
|
IR["Relational IR with Physical Keys"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph Backends["Execution Backends"]
|
||||||
|
DBSP["DBSP"]
|
||||||
|
DD["Differential Dataflow"]
|
||||||
|
Batch["Snapshot Evaluator"]
|
||||||
|
end
|
||||||
|
|
||||||
|
Datalog --> Parse
|
||||||
|
Geolog --> Parse
|
||||||
|
Parse --> Strata --> Catalog --> Graph --> Optimize --> IR
|
||||||
|
IR --> DBSP
|
||||||
|
IR --> DD
|
||||||
|
IR --> Batch
|
||||||
|
```
|
||||||
|
|
||||||
|
This architecture keeps the core questions separate:
|
||||||
|
|
||||||
|
- source language
|
||||||
|
- rule semantics
|
||||||
|
- relational planning
|
||||||
|
- physical execution backend
|
||||||
|
- application integration
|
||||||
|
|
||||||
|
That separation makes experiments easier. If a query is slow, it becomes possible to ask whether the problem is the rule semantics, the plan, the
|
||||||
|
backend, or the storage boundary.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Path
|
||||||
|
|
||||||
|
The practical path should be staged.
|
||||||
|
|
||||||
|
**Stage 1: Planning-Only Prototype**
|
||||||
|
|
||||||
|
```text
|
||||||
|
Datalog-like rules
|
||||||
|
-> dependency graph
|
||||||
|
-> strata
|
||||||
|
-> rule catalog
|
||||||
|
-> join graph
|
||||||
|
-> textual plan
|
||||||
|
```
|
||||||
|
|
||||||
|
This validates whether the compiler understands rule shape.
|
||||||
|
|
||||||
|
**Stage 2: Narrow DBSP Lowering**
|
||||||
|
|
||||||
|
```text
|
||||||
|
planned rules
|
||||||
|
-> projection, selection, join, antijoin, union, distinct, recursion
|
||||||
|
-> DBSP circuit
|
||||||
|
```
|
||||||
|
|
||||||
|
This validates maintained outputs against a snapshot evaluator.
|
||||||
|
|
||||||
|
**Stage 3: Workload Comparison**
|
||||||
|
|
||||||
|
```text
|
||||||
|
same rules
|
||||||
|
same facts
|
||||||
|
same outputs
|
||||||
|
-> DBSP backend
|
||||||
|
-> Differential Dataflow backend
|
||||||
|
-> snapshot backend
|
||||||
|
```
|
||||||
|
|
||||||
|
This identifies whether bottlenecks come from planning or backend behavior.
|
||||||
|
|
||||||
|
**Stage 4: Geomerge Integration**
|
||||||
|
|
||||||
|
```text
|
||||||
|
supported FlatTheory laws
|
||||||
|
-> planned violation queries
|
||||||
|
-> maintained violations relation
|
||||||
|
-> agreement with current validator
|
||||||
|
```
|
||||||
|
|
||||||
|
This makes the DBSP checker a performance optimization first, not a semantic change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Workloads
|
||||||
|
|
||||||
|
The shared test corpus should include:
|
||||||
|
|
||||||
|
- transitive closure
|
||||||
|
- reachability
|
||||||
|
- connected components
|
||||||
|
- antijoin checks
|
||||||
|
- multi-value register
|
||||||
|
- causal readiness
|
||||||
|
- list next-element traversal
|
||||||
|
- tombstone skipping
|
||||||
|
- missing foreign-key violations
|
||||||
|
- multi-atom Geomerge antecedents
|
||||||
|
|
||||||
|
Each workload should have:
|
||||||
|
|
||||||
|
- input schemas
|
||||||
|
- base facts
|
||||||
|
- update facts
|
||||||
|
- expected snapshot output
|
||||||
|
- expected output deltas
|
||||||
|
- recursion and negation classification
|
||||||
|
- accepted or rejected status
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation Questions
|
||||||
|
|
||||||
|
The main evaluation questions are:
|
||||||
|
|
||||||
|
- Does planning reduce hydration time?
|
||||||
|
- Does planning reduce warm-update time?
|
||||||
|
- Does causal-readiness update cost still grow with history depth?
|
||||||
|
- Does antijoin scheduling reduce intermediate relation size?
|
||||||
|
- Does SIP help frontier-shaped recursive queries?
|
||||||
|
- Does physical key choice reduce maintained state?
|
||||||
|
- Does the DBSP result match a snapshot evaluator?
|
||||||
|
- Does Geomerge validation agree with the existing validator?
|
||||||
|
- Is backend state rollback or preview execution tractable?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bottom Line
|
||||||
|
|
||||||
|
The unified conclusion is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
FlowLog is the planning blueprint.
|
||||||
|
DBSP is the target incremental backend.
|
||||||
|
CRDTs and Geomerge laws are the motivating rule sources.
|
||||||
|
```
|
||||||
|
|
||||||
|
The next durable artifact should not be a full engine. It should be a small planner that can explain rule structure, join graphs, antijoin placement,
|
||||||
|
and physical key choices. Once that explanation is correct, DBSP lowering becomes a narrower engineering problem.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **May 21, 2026** -- First version created to unify the first five FlowLog notes.
|
||||||
Loading…
x
Reference in New Issue
Block a user