Hassan Abedi 5f8c9f11ad WIP
2026-06-04 14:59:54 +02:00
..
WIP
2026-06-04 14:59:54 +02:00
WIP
2026-06-04 14:59:54 +02:00
WIP
2026-06-04 14:59:54 +02:00
WIP
2026-06-04 14:59:54 +02:00

Storage

This crate is the storage layer of the workspace. It defines a backend-agnostic Storage trait, the row, value, and identifier types that travel through it, and adapter modules that implement the trait over different engines. Higher-level crates such as query-ops depend on this crate for both the types and the trait.

Public API

Item Kind Description
Storage trait Backend-agnostic interface for storing and retrieving rows. Required methods: create_relation, arity, scan_iter, and transaction. The rest (scan, scan_where, insert, delete) have default implementations.
Transaction trait Atomic batch of inserts and deletes against a Storage. insert returns a pending RowId; commit consumes the boxed transaction and returns a CommittedTx; dropping without committing rolls back.
CommittedTx struct Result of a successful Transaction::commit. Resolves pending RowIds returned during the transaction to their post-commit form via resolve. Empty for KV adapters where pending equals real; populated for geomerge.
StorageError enum Error type returned by every fallible method. Variants: RelationNotFound, RelationExists, ArityMismatch, Validation, Decode, Unsupported, and Backend.
CodecError enum Wire-format failure reported as StorageError::Decode. Variants describe truncation, unknown tags, length overruns, and UTF-8 errors.
RowStream<'a> type alias Box<dyn Iterator<Item = Result<(RowId, Vec<Value>), StorageError>> + 'a>. The value yielded by Storage::scan_iter and Storage::scan_where.
RowId struct Opaque, backend-assigned row identifier. Bytes are inline up to 36 bytes (covers every encoding the workspace produces today) and spill to the heap otherwise. Construct with RowId::new(bytes) or RowId::from(u64).
Value enum Single cell value. Variants: Int(i64), Str(String), and Id(RowId). Value::Id is the foreign-key reference used by geomerge and any future referencing backend.
Table struct Positional input relation with fixed arity. Produced from a backend scan by scan_as_table. Consumed by query-ops operators.
scan_as_table(&dyn Storage, &str) -> Result<Table, StorageError> function Materialize a relation from a Storage backend into a Table for query-language operators. Row IDs are dropped; only cell values remain.
MemoryStorage struct In-process backend kept in HashMaps. Always available; useful for tests and snapshot oracles.
adapters::sqlite::SqliteStorage struct (feat) SQLite-backed Storage, behind the sqlite feature. Uses rusqlite with bundled libsqlite3; supports a single connection with native write transactions.
adapters::redb::RedbStorage struct (feat) Single-file B-tree backed Storage, behind the redb feature. Wraps redb::WriteTransaction for native atomic commits.
adapters::fjall::FjallStorage struct (feat) LSM-tree backed Storage, behind the fjall feature. Each relation gets a partition; transactions buffer inserts and apply them on commit.
adapters::lmdb::LmdbStorage struct (feat) mmap'd B-tree backed Storage, behind the lmdb feature. Wraps heed's RwTxn for native atomic commits.
adapters::geomerge::GeomergeStorage struct (feat) CRDT-backed Storage over the workspace's geomerge crate, behind the geomerge feature. Wraps geomerge::Transaction and resolves pending row IDs via CommittedTx. Deletion is not supported (append-only log).

Data types and their relationships:

Types

Example

The example below opens an in-memory backend, declares a relation, inserts two rows inside a single transaction, then scans the result.

use storage::value::Value;
use storage::{MemoryStorage, Storage, StorageError};

fn i(x: i64) -> Value {
    Value::Int(x)
}

fn main() -> Result<(), StorageError> {
    let mut storage = MemoryStorage::new();
    storage.create_relation("edge", 2)?;

    let (a, b) = {
        let mut tx = storage.transaction()?;
        let a = tx.insert("edge", vec![i(1), i(2)])?;
        let b = tx.insert("edge", vec![i(2), i(3)])?;
        let committed = tx.commit()?;
        // For KV backends pending IDs equal real IDs, so resolve is the identity.
        (committed.resolve(&a), committed.resolve(&b))
    };

    let rows = storage.scan("edge")?;
    assert_eq!(rows, vec![(a, vec![i(1), i(2)]), (b, vec![i(2), i(3)])]);
    Ok(())
}

Swapping MemoryStorage for any other adapter (for example adapters::sqlite::SqliteStorage::open(":memory:")?) requires no other code changes.

How a backend is used (logically):

Workflow

Run the Tests

cargo test -p storage --all-features

Notes

  • Opaque row IDs. A RowId is a backend-assigned byte sequence; callers do not interpret the bytes. KV adapters use big-endian u64; the geomerge adapter encodes a (CommitHash, counter) pair. Hand a RowId back to the same backend to reference an existing row.
  • Pending row IDs. Transaction::insert may return a pending RowId that the backend cannot stabilize until commit; this is the case for geomerge, where the final ID depends on the resulting CommitHash. Resolve such IDs through the CommittedTx returned by commit. For all KV backends the pending ID is already the real one and CommittedTx::resolve is the identity.
  • Streaming first. scan_iter is the primary scan operation; scan defaults to collecting it. In-memory and LSM backends stream natively; B-tree and SQL backends materialize a Vec internally and yield from it to avoid self-referential iterators.
  • Atomic transactions. Adapters with native write transactions (LMDB, redb, SQLite, geomerge) wrap the engine's transaction directly. Adapters without (memory, fjall) buffer pending operations and apply them on commit. Dropping a transaction without calling commit rolls back any pending operations.
  • Deletion support. Most adapters implement delete. The geomerge adapter does not: its append-only commit log returns StorageError::Unsupported("row deletion").
  • Geomerge is alpha. The upstream geomerge crate is prototype-status and its API may change without notice; treat breakage in adapters::geomerge as expected churn rather than regression.
  • Feature gates. MemoryStorage is always available. Every other adapter is feature-gated (lmdb, redb, fjall, sqlite, geomerge) so callers only pay for what they need.