Add two note files for FlowLog (primer and implementation)
This commit is contained in:
parent
ee52b850e4
commit
3d67b4994e
262
flowlog/001-flowlog-primer.md
Normal file
262
flowlog/001-flowlog-primer.md
Normal file
@ -0,0 +1,262 @@
|
|||||||
|
# FlowLog Primer
|
||||||
|
|
||||||
|
A primer on FlowLog as a Datalog engine built on Differential Dataflow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Short Answer
|
||||||
|
|
||||||
|
FlowLog is a Datalog engine for recursive queries. It parses Datalog programs, stratifies rules, builds a relational intermediate representation,
|
||||||
|
optimizes rule plans, and executes them with Differential Dataflow.
|
||||||
|
|
||||||
|
The main idea is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Datalog rules
|
||||||
|
-> relational rule plans
|
||||||
|
-> Differential Dataflow operators
|
||||||
|
-> maintained derived relations
|
||||||
|
```
|
||||||
|
|
||||||
|
FlowLog is not only a parser for Datalog. It is a query engine design that keeps Datalog-specific optimization visible before the program is lowered
|
||||||
|
to a streaming dataflow backend.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why It Exists
|
||||||
|
|
||||||
|
Datalog is useful for recursive computations:
|
||||||
|
|
||||||
|
- graph reachability
|
||||||
|
- transitive closure
|
||||||
|
- program analysis
|
||||||
|
- static analysis
|
||||||
|
- network and distributed-system rules
|
||||||
|
- recursive data-cleaning or constraint logic
|
||||||
|
|
||||||
|
The hard part is execution. Recursive Datalog can spend most of its time and memory on joins inside fixed-point loops. Bad join orders can create
|
||||||
|
large intermediate relations, and the best order can vary by workload and iteration.
|
||||||
|
|
||||||
|
FlowLog tries to keep three properties together:
|
||||||
|
|
||||||
|
- Datalog-level expressiveness
|
||||||
|
- incremental and parallel execution
|
||||||
|
- query planning control before execution
|
||||||
|
|
||||||
|
The design uses Differential Dataflow as the physical backend, but it does not translate Datalog directly into low-level dataflow code. It first
|
||||||
|
creates an intermediate representation where Datalog-aware rewrites can happen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Datalog Model
|
||||||
|
|
||||||
|
A Datalog program contains facts and rules.
|
||||||
|
|
||||||
|
Input relations are extensional database predicates:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
.in
|
||||||
|
.decl Arc(x: number, y: number)
|
||||||
|
.input Arc.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
Derived relations are intensional database predicates:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
.printsize
|
||||||
|
.decl Tc(x: number, y: number)
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules derive output facts from input and already-derived facts:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
.rule
|
||||||
|
Tc(x, y) :- Arc(x, y).
|
||||||
|
Tc(x, y) :- Arc(z, y), Tc(x, z).
|
||||||
|
```
|
||||||
|
|
||||||
|
This example computes transitive closure. The first rule copies direct edges into `Tc`. The second rule recursively extends paths.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Language Features
|
||||||
|
|
||||||
|
FlowLog supports a practical Datalog dialect with:
|
||||||
|
|
||||||
|
- relation declarations
|
||||||
|
- CSV-style input and output directives
|
||||||
|
- recursive rules
|
||||||
|
- stratified negation
|
||||||
|
- comparisons
|
||||||
|
- arithmetic expressions
|
||||||
|
- placeholder arguments with `_`
|
||||||
|
- aggregation with `count`, `sum`, `min`, and `max`
|
||||||
|
- optimization directives such as `.plan`, `.sip`, and `.optimize`
|
||||||
|
|
||||||
|
Negation is written with `!`:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
indirect_only(x, z) :- edge(x, y), edge(y, z), !edge(x, z).
|
||||||
|
```
|
||||||
|
|
||||||
|
Aggregation appears in the head:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
count_paths(x, z, count(y)) :- edge(x, y), edge(y, z).
|
||||||
|
```
|
||||||
|
|
||||||
|
The implementation has limits. Aggregation support is constrained, arithmetic in rule heads is not fully stable in the artifact version, and compile
|
||||||
|
times can be high because the backend depends on Differential Dataflow and Timely Dataflow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Modes
|
||||||
|
|
||||||
|
FlowLog has two execution modes.
|
||||||
|
|
||||||
|
Batch mode is the default. It is intended for static Datalog evaluation where the input facts are loaded and the derived relations are computed.
|
||||||
|
|
||||||
|
Incremental mode uses integer differences so changes can be tracked as insertions and retractions. This fits incremental view maintenance, where input
|
||||||
|
updates should produce output updates.
|
||||||
|
|
||||||
|
The important distinction is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
batch mode:
|
||||||
|
compute the fixed point for an input dataset
|
||||||
|
|
||||||
|
incremental mode:
|
||||||
|
maintain derived results as facts change
|
||||||
|
```
|
||||||
|
|
||||||
|
The paper benchmarks focus on batch execution, but the architecture is designed around incrementality.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Differential Dataflow Role
|
||||||
|
|
||||||
|
Differential Dataflow represents collections as records with data, logical time, and a difference:
|
||||||
|
|
||||||
|
```text
|
||||||
|
(data, time, diff)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `diff` field records multiplicity changes. Positive differences insert facts. Negative differences retract facts.
|
||||||
|
|
||||||
|
Operators such as `map`, `filter`, `join`, `concat`, `distinct`, and `iterate` maintain output changes as input changes arrive. Joins use maintained
|
||||||
|
indexes called arrangements.
|
||||||
|
|
||||||
|
This makes Differential Dataflow a useful backend for Datalog because:
|
||||||
|
|
||||||
|
- Datalog rules are relational queries.
|
||||||
|
- Recursive rules need fixed-point iteration.
|
||||||
|
- Semi-naive evaluation naturally works with deltas.
|
||||||
|
- Maintained arrangements can avoid repeated full scans.
|
||||||
|
|
||||||
|
FlowLog's job is to turn Datalog rules into a form that uses these backend operators efficiently.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stratification
|
||||||
|
|
||||||
|
FlowLog groups rules into strata using the dependency graph of the program.
|
||||||
|
|
||||||
|
A rule depends on another rule if its body mentions the relation derived by that other rule. Recursive rules appear in strongly connected components.
|
||||||
|
The engine evaluates strata in dependency order.
|
||||||
|
|
||||||
|
The usual shape is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
non-recursive strata
|
||||||
|
-> recursive strata
|
||||||
|
-> later strata that depend on earlier outputs
|
||||||
|
```
|
||||||
|
|
||||||
|
This matters for negation and recursion. Negation must be stratified so a rule does not negatively depend on itself through a cycle. Recursive strata
|
||||||
|
need fixed-point evaluation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Optimization Focus
|
||||||
|
|
||||||
|
FlowLog's main contribution is not a new Datalog syntax. It is the optimization boundary between Datalog and Differential Dataflow.
|
||||||
|
|
||||||
|
The system uses a relational intermediate representation per rule. That lets the optimizer reason about joins, filters, subplans, and recursive
|
||||||
|
execution before lowering to physical dataflow operators.
|
||||||
|
|
||||||
|
Two important optimizations are:
|
||||||
|
|
||||||
|
- structural planning
|
||||||
|
- sideways information passing
|
||||||
|
|
||||||
|
Structural planning chooses join plans intended to avoid large intermediate results. It is robustness-oriented: avoid bad plans rather than assume
|
||||||
|
perfect cardinality estimates.
|
||||||
|
|
||||||
|
Sideways information passing uses semijoin-style prefiltering. It pushes known bindings or reachable values sideways through a rule so later joins see
|
||||||
|
less irrelevant input.
|
||||||
|
|
||||||
|
These two optimizations are complementary. Planning improves join shape. SIP reduces input size before the joins happen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Comparison with DBSP
|
||||||
|
|
||||||
|
FlowLog and DBSP live in the same design neighborhood:
|
||||||
|
|
||||||
|
```text
|
||||||
|
relational rules
|
||||||
|
-> incremental computation
|
||||||
|
-> maintained output relations
|
||||||
|
```
|
||||||
|
|
||||||
|
The backend model differs.
|
||||||
|
|
||||||
|
DBSP describes incremental view maintenance through streams, Z-sets, integration, differentiation, and circuit rewriting.
|
||||||
|
|
||||||
|
Differential Dataflow describes incremental collections with logical times and differences, and it maintains arrangements for efficient joins over
|
||||||
|
time.
|
||||||
|
|
||||||
|
For the CRDT and Geomerge notes, FlowLog is useful because it emphasizes a lesson that also applies to DBSP: recursive Datalog performance depends
|
||||||
|
heavily on physical planning. A declarative rule can be correct and still produce expensive intermediate state.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When FlowLog Is Relevant
|
||||||
|
|
||||||
|
FlowLog is relevant when the problem has:
|
||||||
|
|
||||||
|
- recursive relational logic
|
||||||
|
- large graph-shaped inputs
|
||||||
|
- repeated joins inside fixed-point loops
|
||||||
|
- a need for incremental maintenance
|
||||||
|
- sensitivity to join order and memory use
|
||||||
|
|
||||||
|
It is less directly relevant when the problem is mostly point lookups, simple filters, or small non-recursive validation queries. Those can be handled
|
||||||
|
by simpler relational engines.
|
||||||
|
|
||||||
|
The strongest use case is a Datalog workload where rule-level optimization and incremental execution both matter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Mental Model
|
||||||
|
|
||||||
|
FlowLog is best understood as:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Datalog frontend
|
||||||
|
+ per-rule relational IR
|
||||||
|
+ recursive strata planning
|
||||||
|
+ robust join optimization
|
||||||
|
+ Differential Dataflow execution
|
||||||
|
```
|
||||||
|
|
||||||
|
Its central architectural choice is the split between logical Datalog planning and physical dataflow execution.
|
||||||
|
|
||||||
|
That split is what makes the system useful to study. It shows how a Datalog engine can reuse a general incremental backend without giving up
|
||||||
|
Datalog-specific optimization.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **May 19, 2026** -- First version created from the FlowLog paper and artifact.
|
||||||
354
flowlog/002-flowlog-implementation.md
Normal file
354
flowlog/002-flowlog-implementation.md
Normal file
@ -0,0 +1,354 @@
|
|||||||
|
# FlowLog Implementation
|
||||||
|
|
||||||
|
A reading note on the implementation shape of FlowLog's Rust artifact.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Short Answer
|
||||||
|
|
||||||
|
FlowLog is implemented as a Rust workspace with separate crates for parsing, stratification, catalog construction, logical planning, optimization,
|
||||||
|
input reading, execution, and code-generation macros.
|
||||||
|
|
||||||
|
The implementation path is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
.dl file
|
||||||
|
-> parser
|
||||||
|
-> strata
|
||||||
|
-> catalog
|
||||||
|
-> program query plan
|
||||||
|
-> grouped strata plans
|
||||||
|
-> Differential Dataflow dataflow
|
||||||
|
-> output relation sizes or CSVs
|
||||||
|
```
|
||||||
|
|
||||||
|
The most important implementation idea is that FlowLog does not treat Differential Dataflow as a direct code-generation target from raw Datalog. It
|
||||||
|
first builds logical rule plans with explicit collection signatures and transformation flows.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Workspace Shape
|
||||||
|
|
||||||
|
The artifact is organized as a Rust workspace with crates that line up with the execution pipeline.
|
||||||
|
|
||||||
|
`parsing` parses the Datalog dialect. It uses a grammar with declarations, input directives, output directives, rules, negation, comparisons,
|
||||||
|
arithmetic, and aggregation.
|
||||||
|
|
||||||
|
`strata` builds dependency information and groups rules into strata. Recursive rules are identified through the dependency graph.
|
||||||
|
|
||||||
|
`catalog` turns parsed rules into metadata. This includes atoms, head arguments, filters, comparisons, arithmetic expressions, aggregation heads, and
|
||||||
|
rule structure.
|
||||||
|
|
||||||
|
`planning` creates logical query plans from catalogs and strata. This is where rule bodies become transformation chains and where join structure is
|
||||||
|
represented.
|
||||||
|
|
||||||
|
`optimizing` chooses structural join plans. It reasons over the variable overlap among rule atoms and selects plan trees.
|
||||||
|
|
||||||
|
`reading` loads input relations from files and represents rows, relation sessions, semiring-like weights, and arrangements.
|
||||||
|
|
||||||
|
`executing` builds and runs the Differential Dataflow graph. It owns command-line handling, dataflow construction, operators, collectors, and output
|
||||||
|
inspection.
|
||||||
|
|
||||||
|
`macros` provides Rust macros that generate specialized operator code for different key and value arities.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Frontend
|
||||||
|
|
||||||
|
The frontend grammar supports sections like:
|
||||||
|
|
||||||
|
```datalog
|
||||||
|
.in
|
||||||
|
.decl Arc(x: number, y: number)
|
||||||
|
.input Arc.csv
|
||||||
|
|
||||||
|
.printsize
|
||||||
|
.decl Tc(x: number, y: number)
|
||||||
|
|
||||||
|
.rule
|
||||||
|
Tc(x, y) :- Arc(x, y).
|
||||||
|
Tc(x, y) :- Arc(z, y), Tc(x, z).
|
||||||
|
```
|
||||||
|
|
||||||
|
The parser distinguishes:
|
||||||
|
|
||||||
|
- extensional declarations
|
||||||
|
- intensional declarations
|
||||||
|
- rule heads
|
||||||
|
- positive atoms
|
||||||
|
- negated atoms
|
||||||
|
- comparisons
|
||||||
|
- constants
|
||||||
|
- placeholders
|
||||||
|
- aggregate heads
|
||||||
|
|
||||||
|
This is more schema-driven than small teaching Datalog examples. Relation declarations give the engine names and arities before planning.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Strata and Program Plans
|
||||||
|
|
||||||
|
After parsing, FlowLog builds strata from rule dependencies.
|
||||||
|
|
||||||
|
The `ProgramQueryPlan` is created from strata. It iterates through each stratum, builds a `Catalog` for each rule, decides whether SIP and structural
|
||||||
|
planning apply, expands SIP rules when needed, and converts each catalog into a `RuleQueryPlan`.
|
||||||
|
|
||||||
|
The optimizer is only activated for rules with more than two core atoms. This is a pragmatic choice: optimizing one-atom or two-atom rules has little
|
||||||
|
value.
|
||||||
|
|
||||||
|
The planning level also tracks whether a stratum is recursive. Recursive and non-recursive groups are executed differently later.
|
||||||
|
|
||||||
|
The key shape is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Strata
|
||||||
|
-> Catalog per rule
|
||||||
|
-> RuleQueryPlan per catalog
|
||||||
|
-> GroupStrataQueryPlan
|
||||||
|
-> ProgramQueryPlan
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Catalog Role
|
||||||
|
|
||||||
|
The catalog is the bridge between syntax and planning.
|
||||||
|
|
||||||
|
It records which rule atoms are core relational inputs, which terms are filters or constraints, which variables occur where, and how the rule head
|
||||||
|
relates to the body.
|
||||||
|
|
||||||
|
Planning needs this metadata to answer questions such as:
|
||||||
|
|
||||||
|
- Which atoms should participate in joins?
|
||||||
|
- Which atom arguments are shared variables?
|
||||||
|
- Which comparisons can be applied locally?
|
||||||
|
- Which projected fields are needed in the output?
|
||||||
|
- Which atoms are negated?
|
||||||
|
- Which rules derive the same output relation?
|
||||||
|
|
||||||
|
Without this catalog layer, the executor would have to rediscover semantic information from syntax during physical dataflow construction.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Collection Signatures
|
||||||
|
|
||||||
|
FlowLog lowers relations into collection signatures that distinguish row, key, and value shapes.
|
||||||
|
|
||||||
|
Differential Dataflow joins are easiest to express over key-value collections. FlowLog therefore maps relation tuples into several physical forms:
|
||||||
|
|
||||||
|
- row collections for plain tuples
|
||||||
|
- key-only collections for semijoin and antijoin support
|
||||||
|
- key-value collections for joins
|
||||||
|
|
||||||
|
The executor keeps maps for these forms:
|
||||||
|
|
||||||
|
```text
|
||||||
|
row_map
|
||||||
|
kv_map
|
||||||
|
k_map
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a central implementation detail. A Datalog relation looks like one logical predicate, but execution may maintain several arranged physical
|
||||||
|
views of it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Transformation Flow
|
||||||
|
|
||||||
|
The planning layer represents operations with transformation flows.
|
||||||
|
|
||||||
|
A unary transformation maps one collection to another:
|
||||||
|
|
||||||
|
```text
|
||||||
|
KVToKV
|
||||||
|
```
|
||||||
|
|
||||||
|
This covers projection, filtering, local constraints, and reshaping a tuple into key-value form.
|
||||||
|
|
||||||
|
A binary transformation represents a join-like step:
|
||||||
|
|
||||||
|
```text
|
||||||
|
JnToKV
|
||||||
|
```
|
||||||
|
|
||||||
|
This maps joined key-value inputs into a new output key and value shape.
|
||||||
|
|
||||||
|
Each transformation flow tracks:
|
||||||
|
|
||||||
|
- output key arguments
|
||||||
|
- output value arguments
|
||||||
|
- local constraints
|
||||||
|
- comparison expressions
|
||||||
|
- how input fields flow to output fields
|
||||||
|
|
||||||
|
This lets the executor generate a specific Differential Dataflow operator while the planner remains backend-independent enough to reason about rule
|
||||||
|
structure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Structural Planning
|
||||||
|
|
||||||
|
The optimizer builds a plan tree for the core atoms of a rule.
|
||||||
|
|
||||||
|
The default plan is essentially a chain following the rule's atom order. The optimized plan searches for a better tree by looking at variable overlap
|
||||||
|
among atoms.
|
||||||
|
|
||||||
|
The optimizer uses a maximum-spanning-tree-style search over atom overlaps. Then it evaluates candidate trees with a width measure and depth
|
||||||
|
tie-breaker.
|
||||||
|
|
||||||
|
The goal is not perfect cardinality estimation. The goal is robust plan shape:
|
||||||
|
|
||||||
|
- cross-product avoidance when possible
|
||||||
|
- smaller intermediate relation width
|
||||||
|
- earlier joins between atoms that share variables
|
||||||
|
- lower chance of large maintained join state
|
||||||
|
|
||||||
|
This fits recursive Datalog because reliable static cardinality estimates are hard. A robustness-oriented heuristic is often more useful than a
|
||||||
|
fragile cost model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sideways Information Passing
|
||||||
|
|
||||||
|
Sideways information passing is a rule transformation that creates semijoin-style filters.
|
||||||
|
|
||||||
|
The practical goal is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
known useful bindings
|
||||||
|
-> prefilter later atoms
|
||||||
|
-> smaller join inputs
|
||||||
|
-> less intermediate state
|
||||||
|
```
|
||||||
|
|
||||||
|
In the implementation, enabling SIP can expand a catalog into multiple catalogs. For non-recursive strata, this may split one group into several
|
||||||
|
cascading groups so the generated filters can feed later steps.
|
||||||
|
|
||||||
|
This is why planning and stratification interact. SIP is not just a local operator rewrite. It can change the shape of the stratum plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executor Shape
|
||||||
|
|
||||||
|
The executor creates a Timely dataflow and then builds Differential Dataflow collections inside it.
|
||||||
|
|
||||||
|
At startup, it creates input sessions for every extensional relation. Those sessions are used to load facts from files.
|
||||||
|
|
||||||
|
For each stratum group, execution branches by recursion:
|
||||||
|
|
||||||
|
- non-recursive groups are built as straight-line transformations
|
||||||
|
- recursive groups are built inside an iterative scope
|
||||||
|
|
||||||
|
Non-recursive execution walks each transformation and constructs the matching dataflow operator. Outputs are stored back into `row_map`, `kv_map`, or
|
||||||
|
`k_map` depending on their physical shape.
|
||||||
|
|
||||||
|
Recursive execution creates iterative variables for intensional relations and repeatedly applies the recursive transformations until convergence.
|
||||||
|
|
||||||
|
Collectors merge rule outputs for the same intensional predicate. Inspectors print relation sizes or emit outputs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Physical Operators
|
||||||
|
|
||||||
|
The executor has operators corresponding to the physical collection shapes:
|
||||||
|
|
||||||
|
- row to row
|
||||||
|
- row to key
|
||||||
|
- row to key-value
|
||||||
|
- key-value join key-value
|
||||||
|
- key-value join key
|
||||||
|
- key join key
|
||||||
|
- cartesian product
|
||||||
|
- key-value antijoin key
|
||||||
|
- key antijoin key
|
||||||
|
|
||||||
|
The implementation arranges collections when needed. Arrangements are Differential Dataflow's indexed representation for joins and repeated access.
|
||||||
|
|
||||||
|
This is why FlowLog cares about key and value arity. The physical shape determines which macro-generated operator can be used and whether the runtime
|
||||||
|
needs a fallback representation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Arity Strategy
|
||||||
|
|
||||||
|
FlowLog uses specialized fixed-size representations for common arities and a fallback mode for wider tuples.
|
||||||
|
|
||||||
|
The program plan can compute maximal key and value arity pairs. If a query exceeds the fixed-size fallback threshold, fat mode is required.
|
||||||
|
|
||||||
|
This is a performance engineering detail: Datalog workloads can produce wide intermediate tuples, but specializing small tuples can reduce allocation
|
||||||
|
and dynamic dispatch overhead.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Batch and Incremental Weights
|
||||||
|
|
||||||
|
FlowLog has two build modes for weights.
|
||||||
|
|
||||||
|
Batch mode uses a presence-style difference type. This is suited for static Datalog workloads where a fact is either present or absent.
|
||||||
|
|
||||||
|
Incremental mode uses signed integer differences. This can represent insertions, deletions, and multiplicities.
|
||||||
|
|
||||||
|
At the implementation level, this means the same logical engine can target:
|
||||||
|
|
||||||
|
```text
|
||||||
|
static fixed-point computation
|
||||||
|
incremental maintenance over changing inputs
|
||||||
|
```
|
||||||
|
|
||||||
|
The paper's artifact focuses on batch benchmarks, but the backend model is compatible with incremental updates.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Important Limitations
|
||||||
|
|
||||||
|
The artifact has several important limitations:
|
||||||
|
|
||||||
|
- release builds can be slow because of large Timely and Differential Dataflow dependencies
|
||||||
|
- aggregation support is constrained
|
||||||
|
- arithmetic in rule heads is unstable in the artifact version
|
||||||
|
- some optimization paths are controlled by flags or rule directives
|
||||||
|
- SIP currently has implementation-specific handling in stratum grouping
|
||||||
|
- output support is more oriented around relation sizes and CSV dumps than an embedded application API
|
||||||
|
|
||||||
|
These are acceptable for a research artifact, but they matter if comparing FlowLog to an embedded query engine for an application.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lessons for Other Engines
|
||||||
|
|
||||||
|
FlowLog's implementation suggests several reusable lessons.
|
||||||
|
|
||||||
|
A Datalog engine benefits from an explicit rule catalog. It gives optimization and execution a shared view of variables, atoms, filters, and heads.
|
||||||
|
|
||||||
|
Recursive evaluation should not hide join planning. The rules inside a fixed-point loop are where bad plans become expensive.
|
||||||
|
|
||||||
|
Physical arrangements are part of the query plan. If the backend needs key-value indexes, the logical planner should expose key choices explicitly.
|
||||||
|
|
||||||
|
Optimization can be robustness-first. Recursive workloads may not have stable enough statistics for a conventional cost model.
|
||||||
|
|
||||||
|
The frontend and backend should stay separated. Datalog syntax, relational rule planning, and Differential Dataflow execution are different concerns.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Mental Model
|
||||||
|
|
||||||
|
FlowLog's implementation can be read as:
|
||||||
|
|
||||||
|
```text
|
||||||
|
parser and schema loader
|
||||||
|
+ dependency and strata analyzer
|
||||||
|
+ rule catalog
|
||||||
|
+ relational transformation planner
|
||||||
|
+ robust join planner
|
||||||
|
+ SIP expander
|
||||||
|
+ Differential Dataflow executor
|
||||||
|
```
|
||||||
|
|
||||||
|
The implementation is valuable because it shows the concrete machinery needed between a compact Datalog rule and an efficient incremental dataflow
|
||||||
|
program.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
* **May 19, 2026** -- First version created from the FlowLog paper and artifact.
|
||||||
Loading…
x
Reference in New Issue
Block a user