Add a note file for CRDTs and incremental queries
This commit is contained in:
parent
405a609eb8
commit
35b6e8f43f
271
dbsp/001-crdts-datalog-and-incremental-queries.md
Normal file
271
dbsp/001-crdts-datalog-and-incremental-queries.md
Normal file
@ -0,0 +1,271 @@
|
||||
# CRDTs, Datalog, and Incremental Queries
|
||||
|
||||
A primer and glossary for thinking about CRDTs as query-defined data structures.
|
||||
|
||||
---
|
||||
|
||||
## Short Answer
|
||||
|
||||
CRDTs are replicated data structures that let different replicas accept local writes and later converge to the same state.
|
||||
|
||||
One way to define a CRDT is to store every update as an immutable operation and derive the visible state with a deterministic query. If every replica
|
||||
eventually receives the same set of operations, and every replica runs the same deterministic query, then every replica computes the same state.
|
||||
|
||||
Datalog is a good fit for this idea because it is declarative, rule-based, set-oriented, and naturally supports recursion. Incremental query execution
|
||||
matters because the operation set only grows, and recomputing the full query after every new operation would become expensive.
|
||||
|
||||
---
|
||||
|
||||
## Core Idea
|
||||
|
||||
The usual implementation style for a CRDT is an algorithm written in a general-purpose language. The programmer must ensure that concurrent operations
|
||||
commute, or otherwise prove that replicas converge.
|
||||
|
||||
The query-based style changes the burden:
|
||||
|
||||
- operations are stored as append-only facts
|
||||
- CRDT state is defined as a query over those facts
|
||||
- convergence follows from deterministic evaluation over the same input set
|
||||
- performance depends on incremental maintenance of the query result
|
||||
|
||||
The conceptual move is from "merge these mutable states correctly" to "derive this state from the immutable operation history."
|
||||
|
||||
---
|
||||
|
||||
## Operation Log as Database
|
||||
|
||||
In this model, the database does not store only the current value. It stores facts such as:
|
||||
|
||||
```text
|
||||
set(replica_id, counter, key, value)
|
||||
pred(from_replica_id, from_counter, to_replica_id, to_counter)
|
||||
insert(replica_id, counter, parent_replica_id, parent_counter, value)
|
||||
remove(replica_id, counter)
|
||||
```
|
||||
|
||||
The current application state is a derived view over these base facts.
|
||||
|
||||
This is similar to event sourcing, but with an important difference: the query is designed so the result does not depend on delivery order. Replicas
|
||||
can receive operations in different orders and still converge once they have the same operation set.
|
||||
|
||||
---
|
||||
|
||||
## Datalog Role
|
||||
|
||||
Datalog represents derived facts with rules:
|
||||
|
||||
```text
|
||||
overwritten(RepId, Ctr) :-
|
||||
pred(RepId, Ctr, _, _).
|
||||
|
||||
mvrStore(Key, Value) :-
|
||||
set(RepId, Ctr, Key, Value),
|
||||
not overwritten(RepId, Ctr).
|
||||
```
|
||||
|
||||
Read this as:
|
||||
|
||||
- `overwritten` contains operations that appear as predecessors of later operations
|
||||
- `mvrStore` contains key-value pairs whose set operation has not been overwritten
|
||||
|
||||
That rule set describes a multi-value register key-value store. Concurrent writes are preserved as multiple visible values instead of being collapsed
|
||||
by a last-writer-wins policy.
|
||||
|
||||
---
|
||||
|
||||
## Incremental Query Role
|
||||
|
||||
Incremental view maintenance means the engine consumes changes to inputs and produces changes to outputs.
|
||||
|
||||
Instead of:
|
||||
|
||||
```text
|
||||
all operations -> full query recomputation -> full current state
|
||||
```
|
||||
|
||||
the engine aims for:
|
||||
|
||||
```text
|
||||
new operations -> query delta computation -> state delta
|
||||
```
|
||||
|
||||
This matters for CRDTs because the source relation is a growing history. Without incremental execution, a long-lived document or key-value store pays
|
||||
more and more cost per update.
|
||||
|
||||
DBSP is one framework for expressing this kind of incremental computation. It models relations as changing streams and maintains query results through
|
||||
operators such as joins, projections, differences, antijoins, and fixed-point iterations.
|
||||
|
||||
---
|
||||
|
||||
## Example: Multi-Value Register Key-Value Store
|
||||
|
||||
A multi-value register keeps all concurrent values for a key. If one write causally overwrites another, the older value disappears. If two writes are
|
||||
concurrent, both remain visible.
|
||||
|
||||
The operation facts are:
|
||||
|
||||
- `set`: an assignment of a value to a key
|
||||
- `pred`: a causal dependency edge from one operation to another
|
||||
|
||||
The query computes visible key-value pairs by selecting `set` facts that are not known to be overwritten.
|
||||
|
||||
This is useful because conflict handling is explicit. The application can see concurrent values and decide how to resolve them.
|
||||
|
||||
---
|
||||
|
||||
## Example: Causal Readiness
|
||||
|
||||
The simple key-value query assumes operations are processed in causal order.
|
||||
|
||||
If operations can arrive out of order, the query needs an additional causal-readiness check. A causal-readiness rule derives which operations can be
|
||||
safely exposed because their causal predecessors are present.
|
||||
|
||||
This usually involves recursive graph traversal over the causal dependency graph. It is more semantically complete, but it can be more expensive
|
||||
because recursive fixed-point evaluation may depend on the depth of the causal history.
|
||||
|
||||
---
|
||||
|
||||
## Example: List CRDT
|
||||
|
||||
A list CRDT must converge not only on membership, but also on element order.
|
||||
|
||||
The query-based formulation uses insert operations shaped like:
|
||||
|
||||
```text
|
||||
insert(replica_id, counter, parent_replica_id, parent_counter, value)
|
||||
```
|
||||
|
||||
Each inserted element points to the element after which it was inserted. This forms a tree:
|
||||
|
||||
- the root is a sentinel element
|
||||
- children represent concurrent insertions after the same parent
|
||||
- siblings are ordered deterministically by operation identifiers
|
||||
- the visible list comes from a depth-first traversal
|
||||
|
||||
Deletes use tombstones. The deleted element remains as a structural anchor, but the final visible list skips it.
|
||||
|
||||
---
|
||||
|
||||
## Query Engine Shape
|
||||
|
||||
A prototype engine for this approach usually needs these pieces:
|
||||
|
||||
- a Datalog parser
|
||||
- dependency analysis between predicates
|
||||
- stratification checks for negation
|
||||
- translation from Datalog rules to relational algebra
|
||||
- relational operators such as projection, selection, join, antijoin, union, difference, and distinct
|
||||
- fixed-point execution for recursion
|
||||
- incremental maintenance for input and output changes
|
||||
|
||||
The implementation path is:
|
||||
|
||||
```text
|
||||
Datalog program
|
||||
-> abstract syntax tree
|
||||
-> predicate dependency graph
|
||||
-> execution order
|
||||
-> relational intermediate representation
|
||||
-> incremental circuit or runtime plan
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Glossary
|
||||
|
||||
**CRDT**: A replicated data structure whose replicas converge after they have received the same updates, even if writes happened concurrently.
|
||||
|
||||
**Replica**: One copy of the data structure. A replica can accept local writes and later exchange operations with other replicas.
|
||||
|
||||
**Convergence**: The property that replicas eventually compute the same state after receiving the same information.
|
||||
|
||||
**Strong Eventual Consistency**: The combination of eventual delivery and deterministic convergence for replicas that have seen the same updates.
|
||||
|
||||
**Operation-Based CRDT**: A CRDT represented by operations that are generated at replicas and disseminated to other replicas.
|
||||
|
||||
**Immutable Operation**: An update fact that is never changed after creation. The operation set grows monotonically.
|
||||
|
||||
**Operation Identifier**: A unique identifier for an operation, often a pair of replica id and counter.
|
||||
|
||||
**Lamport Clock**: A logical counter used to order events without relying on wall-clock time.
|
||||
|
||||
**Causal Dependency**: A relationship saying one operation was known before another operation was created.
|
||||
|
||||
**Causal History**: The graph or set of causal dependencies among operations.
|
||||
|
||||
**Causal Broadcast**: A delivery discipline in which an operation is delivered only after its causal predecessors.
|
||||
|
||||
**Causal Readiness**: The condition that an operation has all required causal predecessors available.
|
||||
|
||||
**Concurrent Operations**: Operations where neither causally depends on the other.
|
||||
|
||||
**Multi-Value Register**: A register that exposes concurrent values instead of choosing one automatically.
|
||||
|
||||
**Last-Writer-Wins Register**: A register that picks a single winner, usually by timestamp or operation id.
|
||||
|
||||
**Tombstone**: A retained marker for a deleted element. It preserves references needed by later or concurrent operations.
|
||||
|
||||
**Datalog**: A declarative logic programming language based on facts and rules.
|
||||
|
||||
**Fact**: A ground tuple in a predicate, such as `set(1, 2, 10, 99)`.
|
||||
|
||||
**Rule**: A derivation statement that says when new facts are true.
|
||||
|
||||
**Predicate**: A named relation in Datalog.
|
||||
|
||||
**Extensional Database Predicate**: An input predicate whose facts are provided directly.
|
||||
|
||||
**Intensional Database Predicate**: A derived predicate whose facts come from rules.
|
||||
|
||||
**Stratified Negation**: A restriction on negation that avoids circular negative dependencies.
|
||||
|
||||
**Fixed Point**: The stable result reached when repeated rule application produces no new facts.
|
||||
|
||||
**Relational Algebra**: A set of operators for transforming relations, such as projection, selection, join, union, and difference.
|
||||
|
||||
**Antijoin**: An operator that keeps rows from the left input only when they have no matching row in the right input.
|
||||
|
||||
**Incremental View Maintenance**: Maintaining a derived result by applying input changes instead of recomputing from scratch.
|
||||
|
||||
**Delta**: A change to a relation or query result, often represented as inserted and removed tuples.
|
||||
|
||||
**DBSP**: A framework for incremental computation on changing relations and streams.
|
||||
|
||||
**Hydration**: Rebuilding the internal operator state from an existing operation history, commonly during application startup.
|
||||
|
||||
**Near-Real-Time Processing**: Applying small new batches after the query plan is already initialized.
|
||||
|
||||
---
|
||||
|
||||
## Design Questions
|
||||
|
||||
- Can the CRDT be expressed using monotonic inputs and deterministic rules?
|
||||
- Does the query need negation, recursion, or both?
|
||||
- Is negation stratified?
|
||||
- Does causal readiness require graph traversal?
|
||||
- Can the expensive parts be maintained incrementally?
|
||||
- Does the system need the full operation history forever?
|
||||
- Can old operations be compacted without changing future query results?
|
||||
- Are tombstones required for structural references?
|
||||
- Are operation identifiers enough to define a deterministic order?
|
||||
- Does the application need conflicts exposed or automatically resolved?
|
||||
|
||||
---
|
||||
|
||||
## Practical Mental Model
|
||||
|
||||
The CRDT operation log is the base table.
|
||||
|
||||
The CRDT state is a materialized view.
|
||||
|
||||
Datalog defines the view.
|
||||
|
||||
DBSP or another incremental engine maintains the view as operations arrive.
|
||||
|
||||
The convergence argument is simple: same facts plus same deterministic query equals same derived state.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **May 6, 2026** -- First version created.
|
||||
Loading…
x
Reference in New Issue
Block a user