useful-notes/dbsp/002-why-crdts-as-queries.md
2026-05-07 10:22:45 +02:00

16 KiB

Why CRDTs as Queries

A coherent reading note on the idea of defining replicated data structures as deterministic queries over immutable operations.


Starting Point

The basic problem behind CRDTs is easy to state and hard to implement well. Several replicas hold copies of the same logical data. Each replica should be able to accept writes locally, including while offline or disconnected from the other replicas. Later, when replicas exchange information, they should converge to the same state.

This is not the same problem as ordinary database replication. In a conventional primary-replica database, a single authority can decide the order of writes. If two users write at the same time, the system can serialize those writes through a leader, a lock, or a consensus protocol. That gives the database a single history. CRDTs are designed for environments where that coordination is unavailable, undesirable, or too expensive. A user should still be able to write locally, even if there is no reachable leader.

The price is that the system must handle concurrency after the fact. Two replicas may both accept writes that neither knew about at the time. When those writes later meet, the system must define what the merged state means. A CRDT is a data structure whose merge behavior is designed so that all replicas eventually compute the same state.


The Traditional Burden

In a hand-written CRDT, the implementer writes an algorithm whose operations are safe under concurrency. For an operation-based CRDT, that often means concurrent operations must commute: applying operation a and then operation b must lead to the same logical state as applying b and then a, at least when a and b are concurrent.

This is manageable for simple structures such as grow-only sets. It becomes more subtle for registers, maps, lists, trees, undo and redo, and nested documents. Ordered lists are a good example. A list CRDT must not only decide whether an element exists, but also where it appears. Concurrent insertions at the same position must be ordered deterministically. Deletions must not destroy information that later or concurrent insertions might reference. These details produce the familiar machinery of operation identifiers, tombstones, causal dependencies, and tie-breaking rules.

The programmer therefore has two jobs. First, they must design the logical behavior of the data type. Second, they must implement it in a way that preserves convergence under every possible delivery order. The second job is where many mistakes hide.


Query-Based Turn

The query-based approach reframes the problem. Instead of treating the CRDT as mutable state plus a merge algorithm, it treats the CRDT as a derived view over an immutable operation log.

The replica stores operations as facts:

set(replica_id, counter, key, value)
pred(from_replica_id, from_counter, to_replica_id, to_counter)
insert(replica_id, counter, parent_replica_id, parent_counter, value)
remove(replica_id, counter)

The visible state is not the primary stored object. The visible state is the result of a query over those facts.

This gives a clean convergence story. If two replicas have the same operation facts, and they evaluate the same deterministic query, they must compute the same result. The query does not depend on the order in which operations arrived. Arrival order is an implementation detail. The logical input is a set or multiset of immutable facts.

That shift is powerful because it moves convergence reasoning into the structure of the computation. The developer still has to define the intended semantics, but they are no longer hand-coding every step of the merge algorithm.


Why Datalog Fits

Datalog is a natural language for this style because it is built around facts and derived facts. A Datalog program says which new facts follow from existing facts. The execution model is close to the mental model of derived views.

For example, a multi-value register key-value store can be described by storing set operations and causal predecessor edges. A value is visible if its set operation has not been overwritten by a causally later operation. In Datalog-like notation:

overwritten(RepId, Ctr) :-
    pred(RepId, Ctr, _, _).

mvrStore(Key, Value) :-
    set(RepId, Ctr, Key, Value),
    not overwritten(RepId, Ctr).

This is small, but it captures an important semantic choice. The query does not pick one winner among concurrent values. It filters out values that have been causally superseded. If two values are concurrent, neither overwrites the other, so both remain visible.

Datalog also handles recursion directly. That matters because causal histories and list structures are graph-shaped. Asking whether an operation is reachable from a root, whether a dependency chain is complete, or what the next visible list element is can require recursive rules.

The query language is restricted enough to make evaluation well-defined, but expressive enough to describe useful replicated structures.


Causality as Data

Causality is the difference between "this write came after that write" and "these writes were independent." CRDTs need this distinction because overwriting should usually remove only causally prior values, not concurrent values.

The operation identifier is usually a pair:

(replica_id, counter)

The replica id identifies where the operation came from. The counter is a local logical clock. Together, they make operation identifiers unique. They also provide a deterministic tie-breaker when the data type needs a total order among concurrent operations.

Causal dependencies can be represented as edges:

pred(from_replica_id, from_counter, to_replica_id, to_counter)

This says the to operation depends on the from operation. The dependency graph is then ordinary relational data. A query can derive roots, leaves, overwritten operations, causally ready operations, and visible values.

There is an important design choice here. If the network or runtime guarantees causal delivery, the query can be simpler. If operations may arrive out of order, the query must avoid exposing operations whose dependencies have not arrived. That means causal readiness becomes part of the query.


Out-of-Order Delivery

Out-of-order delivery is common in distributed systems. A replica might receive an operation before receiving the operation it depends on. If the system exposes the later operation too early, it can show a state that is not valid under the intended causal semantics.

A causal-readiness query guards against this. It derives which operations can be safely considered visible because their dependency chain is present. Conceptually, the query walks the causal graph from roots toward leaves and marks operations as ready only when the necessary predecessors are available.

This improves correctness in less controlled networks, but it adds cost. Graph traversal is recursive. Recursive incremental computation is possible, but it is not free. If the query repeatedly walks long causal chains, performance may grow with the depth of the history.

This is one of the central engineering lessons. Declarative correctness is not the same as automatic efficiency. The query may be compact and semantically clear, while still requiring careful optimization.


Lists Are the Hard Case

A key-value register demonstrates the idea, but an ordered list shows why the idea is interesting.

In a collaborative text editor, users insert and delete characters. If two users insert at the same position concurrently, the final document must place both insertions somewhere, and all replicas must choose the same order. The data type cannot rely on local array indexes because indexes shift as edits arrive.

A common CRDT solution is to give every inserted element a stable identifier. An insertion does not say "put this character at index 12." It says "put this character after element (r, c)." The insert operations form a tree:

insert(replica_id, counter, parent_replica_id, parent_counter, value)

The sentinel root represents the beginning of the list. Children of a node are insertions that targeted that node as their parent. Concurrent insertions after the same parent become siblings. Siblings are ordered deterministically by operation identifier. The visible list is then obtained by traversing the tree in a deterministic order.

Deletion is also subtle. If an element is deleted, it often cannot be physically removed from the structural history, because later or concurrent operations may refer to it as a parent. The system keeps a tombstone: the element remains as a reference point, but the visible list skips its value.

In query terms, list behavior becomes a set of derived relations: first child, next sibling, next element, visible element, and next visible element. Datalog can express those relations directly. The query is longer than the register example, but it is still a declarative description of the list semantics.


Incremental View Maintenance

If CRDT state is a query over all operations, the obvious worry is cost. The operation set only grows. A naive implementation would recompute the entire query result every time a new operation arrives.

Incremental view maintenance is the response. The engine maintains the result of a query as inputs change. When a new operation arrives, the engine computes the change to the output rather than recomputing the whole output.

For a replicated application, the desired runtime shape is:

new operation facts
-> incremental query update
-> visible state changes
-> application update

This shape is especially attractive for user interfaces. If the query engine emits deltas, the application can update only the affected views. It is also attractive for local-first systems because the same machinery can process local writes, remote writes, and startup replay.

DBSP is relevant here because it provides a formal and practical model for incremental computation over changing relations. Relational operators are lifted into a streaming setting where inputs and outputs evolve over time.


Hydration and Warm Updates

There are two different performance situations to keep separate.

Hydration is startup. The application already has a stored operation history, but the query engine must rebuild its internal operator state. It may need to parse the query, build the execution plan, feed in the existing facts, and produce the current state. Hydration measures how long it takes before the application can show the document or database contents after opening.

Warm update processing is the normal running mode after hydration. The query engine already has internal state. A small batch of new operations arrives. The engine only needs to update the maintained result.

A design can perform acceptably in warm updates but poorly during hydration, or the other way around. For CRDT-backed applications, both matter. A collaborative editor must feel responsive while editing, but it must also open large documents without a long pause.

This distinction also suggests possible hybrid strategies. A system might use a batch-oriented computation for startup and then switch to incremental maintenance. Or it might persist internal operator state so startup does not require replaying the entire operation history.


Relational Intermediate Representation

A Datalog program is convenient for users, but the execution engine usually wants a lower-level representation. A common design is to translate Datalog into a relational intermediate representation.

The relational IR can include operators such as:

  • projection
  • selection
  • join
  • antijoin
  • union
  • difference
  • distinct
  • fixed-point iteration

This is useful for two reasons. First, relational algebra is a good target for optimization. The engine can push down filters, remove unused fields, combine projections, and choose join strategies. Second, the IR separates the frontend language from the execution backend. Datalog is one possible frontend. DBSP is one possible incremental backend.

That separation matters for research and engineering. If the backend changes from DBSP to another incremental framework, the Datalog frontend does not have to be redesigned. If a SQL-like frontend is added later, it can target the same IR.


Where the Costs Hide

The query-based approach simplifies some reasoning, but it does not erase hard systems problems.

Negation is one source of care. Datalog with arbitrary negation can have unclear or unstable semantics. Stratified negation restricts programs so negative dependencies do not form problematic cycles. This keeps evaluation understandable, but it limits what can be expressed directly.

Recursion is another source of cost. Recursive rules are needed for graph reachability, transitive closure, causal readiness, and list traversal. Incremental recursion can still be expensive when each update affects a long chain or a large region of the dependency graph.

Join planning matters as well. Datalog rules often translate into joins. Bad join order can create large intermediate relations. In a continuously maintained query, changing the plan later may be harder than changing it for a one-shot query because the operators hold state.

Storage growth is also unresolved. The clean convergence story assumes a monotonically growing operation set. Real applications cannot always keep every operation forever, especially on small devices. Compaction must preserve enough information for future queries and future synchronization.

These are not arguments against the approach. They are the places where the approach becomes a database systems problem rather than only a programming-language idea.


What This Approach Buys

The first benefit is conceptual. The CRDT is specified as a query. The implementation has a clearer boundary between logical behavior and physical execution. That is the same separation that made relational databases powerful: the user specifies what result should exist, and the engine decides how to maintain it.

The second benefit is extensibility. A fixed CRDT library exposes a fixed set of data types. A query-based system could let application developers define custom replicated structures, provided they stay within the safe fragment of the language.

The third benefit is a shared interface. Ordinary application state, derived views, and replicated state can all look like queries. This could reduce the number of special-purpose layers in local-first applications.

The fourth benefit is optimization headroom. If CRDTs are expressed through a query plan, improvements to the query engine can improve many CRDT definitions without changing their logical definitions.


What Remains Open

Several questions remain open before this style can be treated as a production design.

Can enough useful CRDTs be expressed in Datalog with stratified negation? Registers and lists are promising examples, but nested documents, moves in trees, undo and redo, and rich JSON-like structures are harder.

Can incremental evaluation make the performance competitive with hand-written CRDTs? A hand-written CRDT can exploit structure-specific shortcuts. A query engine needs optimization to avoid paying too much for generality.

Can operation histories be compacted safely? Append-only facts are clean, but unbounded growth is not acceptable for every application.

Can the system provide good error messages and type checking? A query language for application developers needs more than a working parser and runtime.

Can causal readiness be optimized around the common case? Most new operations in a live application are likely close to current causal heads, but a naive recursive query may still traverse from roots.

These questions define the practical research agenda.


Reading Frame

The best way to read this line of work is as a bridge between three areas.

From CRDTs, it takes the goal of coordination-free replicated state.

From Datalog, it takes declarative rules, recursion, and deterministic derivation over facts.

From incremental query engines, it takes the ability to maintain derived state as input changes.

The slogan is:

replicated data structure = materialized query over immutable operations

That slogan is not the whole system, but it is the core mental model. The data structure is no longer only an object with methods. It is a maintained view. The operation log is the base data. The query is the semantics. The incremental engine is the execution strategy.


Changelog

  • May 7, 2026 -- First version created.