useful-notes/dbsp/002-why-crdts-as-queries.md

318 lines
16 KiB
Markdown
Raw Permalink Normal View History

# Why CRDTs as Queries
A coherent reading note on the idea of defining replicated data structures as deterministic queries over immutable operations.
---
## Starting Point
The basic problem behind CRDTs is easy to state and hard to implement well. Several replicas hold copies of the same logical data. Each replica should
be able to accept writes locally, including while offline or disconnected from the other replicas. Later, when replicas exchange information, they
should converge to the same state.
This is not the same problem as ordinary database replication. In a conventional primary-replica database, a single authority can decide the order of
writes. If two users write at the same time, the system can serialize those writes through a leader, a lock, or a consensus protocol. That gives the
database a single history. CRDTs are designed for environments where that coordination is unavailable, undesirable, or too expensive. A user should
still be able to write locally, even if there is no reachable leader.
The price is that the system must handle concurrency after the fact. Two replicas may both accept writes that neither knew about at the time. When
those writes later meet, the system must define what the merged state means. A CRDT is a data structure whose merge behavior is designed so that all
replicas eventually compute the same state.
---
## The Traditional Burden
In a hand-written CRDT, the implementer writes an algorithm whose operations are safe under concurrency. For an operation-based CRDT, that often means
concurrent operations must commute: applying operation `a` and then operation `b` must lead to the same logical state as applying `b` and then `a`, at
least when `a` and `b` are concurrent.
This is manageable for simple structures such as grow-only sets. It becomes more subtle for registers, maps, lists, trees, undo and redo, and nested
documents. Ordered lists are a good example. A list CRDT must not only decide whether an element exists, but also where it appears. Concurrent
insertions at the same position must be ordered deterministically. Deletions must not destroy information that later or concurrent insertions might
reference. These details produce the familiar machinery of operation identifiers, tombstones, causal dependencies, and tie-breaking rules.
The programmer therefore has two jobs. First, they must design the logical behavior of the data type. Second, they must implement it in a way that
preserves convergence under every possible delivery order. The second job is where many mistakes hide.
---
## Query-Based Turn
The query-based approach reframes the problem. Instead of treating the CRDT as mutable state plus a merge algorithm, it treats the CRDT as a derived
view over an immutable operation log.
The replica stores operations as facts:
```text
set(replica_id, counter, key, value)
pred(from_replica_id, from_counter, to_replica_id, to_counter)
insert(replica_id, counter, parent_replica_id, parent_counter, value)
remove(replica_id, counter)
```
The visible state is not the primary stored object. The visible state is the result of a query over those facts.
This gives a clean convergence story. If two replicas have the same operation facts, and they evaluate the same deterministic query, they must compute
the same result. The query does not depend on the order in which operations arrived. Arrival order is an implementation detail. The logical input is a
set or multiset of immutable facts.
That shift is powerful because it moves convergence reasoning into the structure of the computation. The developer still has to define the intended
semantics, but they are no longer hand-coding every step of the merge algorithm.
---
## Why Datalog Fits
Datalog is a natural language for this style because it is built around facts and derived facts. A Datalog program says which new facts follow from
existing facts. The execution model is close to the mental model of derived views.
For example, a multi-value register key-value store can be described by storing `set` operations and causal predecessor edges. A value is visible if
its set operation has not been overwritten by a causally later operation. In Datalog-like notation:
```text
overwritten(RepId, Ctr) :-
pred(RepId, Ctr, _, _).
mvrStore(Key, Value) :-
set(RepId, Ctr, Key, Value),
not overwritten(RepId, Ctr).
```
This is small, but it captures an important semantic choice. The query does not pick one winner among concurrent values. It filters out values that
have been causally superseded. If two values are concurrent, neither overwrites the other, so both remain visible.
Datalog also handles recursion directly. That matters because causal histories and list structures are graph-shaped. Asking whether an operation is
reachable from a root, whether a dependency chain is complete, or what the next visible list element is can require recursive rules.
The query language is restricted enough to make evaluation well-defined, but expressive enough to describe useful replicated structures.
---
## Causality as Data
Causality is the difference between "this write came after that write" and "these writes were independent." CRDTs need this distinction because
overwriting should usually remove only causally prior values, not concurrent values.
The operation identifier is usually a pair:
```text
(replica_id, counter)
```
The replica id identifies where the operation came from. The counter is a local logical clock. Together, they make operation identifiers unique. They
also provide a deterministic tie-breaker when the data type needs a total order among concurrent operations.
Causal dependencies can be represented as edges:
```text
pred(from_replica_id, from_counter, to_replica_id, to_counter)
```
This says the `to` operation depends on the `from` operation. The dependency graph is then ordinary relational data. A query can derive roots, leaves,
overwritten operations, causally ready operations, and visible values.
There is an important design choice here. If the network or runtime guarantees causal delivery, the query can be simpler. If operations may arrive out
of order, the query must avoid exposing operations whose dependencies have not arrived. That means causal readiness becomes part of the query.
---
## Out-of-Order Delivery
Out-of-order delivery is common in distributed systems. A replica might receive an operation before receiving the operation it depends on. If the
system exposes the later operation too early, it can show a state that is not valid under the intended causal semantics.
A causal-readiness query guards against this. It derives which operations can be safely considered visible because their dependency chain is present.
Conceptually, the query walks the causal graph from roots toward leaves and marks operations as ready only when the necessary predecessors are
available.
This improves correctness in less controlled networks, but it adds cost. Graph traversal is recursive. Recursive incremental computation is possible,
but it is not free. If the query repeatedly walks long causal chains, performance may grow with the depth of the history.
This is one of the central engineering lessons. Declarative correctness is not the same as automatic efficiency. The query may be compact and
semantically clear, while still requiring careful optimization.
---
## Lists Are the Hard Case
A key-value register demonstrates the idea, but an ordered list shows why the idea is interesting.
In a collaborative text editor, users insert and delete characters. If two users insert at the same position concurrently, the final document must
place both insertions somewhere, and all replicas must choose the same order. The data type cannot rely on local array indexes because indexes shift
as edits arrive.
A common CRDT solution is to give every inserted element a stable identifier. An insertion does not say "put this character at index 12." It says "put
this character after element `(r, c)`." The insert operations form a tree:
```text
insert(replica_id, counter, parent_replica_id, parent_counter, value)
```
The sentinel root represents the beginning of the list. Children of a node are insertions that targeted that node as their parent. Concurrent
insertions after the same parent become siblings. Siblings are ordered deterministically by operation identifier. The visible list is then obtained by
traversing the tree in a deterministic order.
Deletion is also subtle. If an element is deleted, it often cannot be physically removed from the structural history, because later or concurrent
operations may refer to it as a parent. The system keeps a tombstone: the element remains as a reference point, but the visible list skips its value.
In query terms, list behavior becomes a set of derived relations: first child, next sibling, next element, visible element, and next visible element.
Datalog can express those relations directly. The query is longer than the register example, but it is still a declarative description of the list
semantics.
---
## Incremental View Maintenance
If CRDT state is a query over all operations, the obvious worry is cost. The operation set only grows. A naive implementation would recompute the
entire query result every time a new operation arrives.
Incremental view maintenance is the response. The engine maintains the result of a query as inputs change. When a new operation arrives, the engine
computes the change to the output rather than recomputing the whole output.
For a replicated application, the desired runtime shape is:
```text
new operation facts
-> incremental query update
-> visible state changes
-> application update
```
This shape is especially attractive for user interfaces. If the query engine emits deltas, the application can update only the affected views. It is
also attractive for local-first systems because the same machinery can process local writes, remote writes, and startup replay.
DBSP is relevant here because it provides a formal and practical model for incremental computation over changing relations. Relational operators are
lifted into a streaming setting where inputs and outputs evolve over time.
---
## Hydration and Warm Updates
There are two different performance situations to keep separate.
Hydration is startup. The application already has a stored operation history, but the query engine must rebuild its internal operator state. It may
need to parse the query, build the execution plan, feed in the existing facts, and produce the current state. Hydration measures how long it takes
before the application can show the document or database contents after opening.
Warm update processing is the normal running mode after hydration. The query engine already has internal state. A small batch of new operations
arrives. The engine only needs to update the maintained result.
A design can perform acceptably in warm updates but poorly during hydration, or the other way around. For CRDT-backed applications, both matter. A
collaborative editor must feel responsive while editing, but it must also open large documents without a long pause.
This distinction also suggests possible hybrid strategies. A system might use a batch-oriented computation for startup and then switch to incremental
maintenance. Or it might persist internal operator state so startup does not require replaying the entire operation history.
---
## Relational Intermediate Representation
A Datalog program is convenient for users, but the execution engine usually wants a lower-level representation. A common design is to translate
Datalog into a relational intermediate representation.
The relational IR can include operators such as:
- projection
- selection
- join
- antijoin
- union
- difference
- distinct
- fixed-point iteration
This is useful for two reasons. First, relational algebra is a good target for optimization. The engine can push down filters, remove unused fields,
combine projections, and choose join strategies. Second, the IR separates the frontend language from the execution backend. Datalog is one possible
frontend. DBSP is one possible incremental backend.
That separation matters for research and engineering. If the backend changes from DBSP to another incremental framework, the Datalog frontend does not
have to be redesigned. If a SQL-like frontend is added later, it can target the same IR.
---
## Where the Costs Hide
The query-based approach simplifies some reasoning, but it does not erase hard systems problems.
Negation is one source of care. Datalog with arbitrary negation can have unclear or unstable semantics. Stratified negation restricts programs so
negative dependencies do not form problematic cycles. This keeps evaluation understandable, but it limits what can be expressed directly.
Recursion is another source of cost. Recursive rules are needed for graph reachability, transitive closure, causal readiness, and list traversal.
Incremental recursion can still be expensive when each update affects a long chain or a large region of the dependency graph.
Join planning matters as well. Datalog rules often translate into joins. Bad join order can create large intermediate relations. In a continuously
maintained query, changing the plan later may be harder than changing it for a one-shot query because the operators hold state.
Storage growth is also unresolved. The clean convergence story assumes a monotonically growing operation set. Real applications cannot always keep
every operation forever, especially on small devices. Compaction must preserve enough information for future queries and future synchronization.
These are not arguments against the approach. They are the places where the approach becomes a database systems problem rather than only a
programming-language idea.
---
## What This Approach Buys
The first benefit is conceptual. The CRDT is specified as a query. The implementation has a clearer boundary between logical behavior and physical
execution. That is the same separation that made relational databases powerful: the user specifies what result should exist, and the engine decides
how to maintain it.
The second benefit is extensibility. A fixed CRDT library exposes a fixed set of data types. A query-based system could let application developers
define custom replicated structures, provided they stay within the safe fragment of the language.
The third benefit is a shared interface. Ordinary application state, derived views, and replicated state can all look like queries. This could reduce
the number of special-purpose layers in local-first applications.
The fourth benefit is optimization headroom. If CRDTs are expressed through a query plan, improvements to the query engine can improve many CRDT
definitions without changing their logical definitions.
---
## What Remains Open
Several questions remain open before this style can be treated as a production design.
Can enough useful CRDTs be expressed in Datalog with stratified negation? Registers and lists are promising examples, but nested documents, moves in
trees, undo and redo, and rich JSON-like structures are harder.
Can incremental evaluation make the performance competitive with hand-written CRDTs? A hand-written CRDT can exploit structure-specific shortcuts. A
query engine needs optimization to avoid paying too much for generality.
Can operation histories be compacted safely? Append-only facts are clean, but unbounded growth is not acceptable for every application.
Can the system provide good error messages and type checking? A query language for application developers needs more than a working parser and
runtime.
Can causal readiness be optimized around the common case? Most new operations in a live application are likely close to current causal heads, but a
naive recursive query may still traverse from roots.
These questions define the practical research agenda.
---
## Reading Frame
The best way to read this line of work is as a bridge between three areas.
From CRDTs, it takes the goal of coordination-free replicated state.
From Datalog, it takes declarative rules, recursion, and deterministic derivation over facts.
From incremental query engines, it takes the ability to maintain derived state as input changes.
The slogan is:
```text
replicated data structure = materialized query over immutable operations
```
That slogan is not the whole system, but it is the core mental model. The data structure is no longer only an object with methods. It is a maintained
view. The operation log is the base data. The query is the semantics. The incremental engine is the execution strategy.
---
## Changelog
* **May 7, 2026** -- First version created.