Hassan Abedi 4b866067a4 WIP
2026-06-05 12:50:43 +02:00

9.6 KiB

Storage Abstraction Layer

This crate is an implementation of a storage access layer. It defines an interface for storing and retrieving data from a storage backend, in a generic way. Higher-level crates such as query-ops should use this crate to access the storage. This crates helps with decoupling the query execution logic from the underlying storage implementation.

Public API

Item Kind Description
Storage trait Backend-agnostic interface for storing and retrieving rows. Required methods: create_relation, arity, scan_iter, and transaction. The rest (scan, scan_where, insert, delete) have default implementations.
Transaction trait Atomic batch of inserts and deletes against a Storage. insert returns a pending RowId; commit consumes the boxed transaction and returns a CommittedTx; dropping without committing rolls back.
CommittedTx struct Result of a successful Transaction::commit. Resolves pending RowIds returned during the transaction to their post-commit form via resolve. Empty for KV adapters where pending equals real; populated for geomerge.
StorageError enum Error type returned by every fallible method. Variants: RelationNotFound, RelationExists, ArityMismatch, Validation, Decode, Unsupported, and Backend.
CodecError enum Wire-format failure reported as StorageError::Decode. Variants describe truncation, unknown tags, length overruns, and UTF-8 errors.
RowStream<'a> type alias Box<dyn Iterator<Item = Result<(RowId, Vec<Value>), StorageError>> + 'a>. The value yielded by Storage::scan_iter and Storage::scan_where.
RowId struct Opaque, backend-assigned row identifier. Bytes are inline up to 36 bytes (covers every encoding the workspace produces today) and spill to the heap otherwise. Construct with RowId::new(bytes) or RowId::from(u64).
Value enum Single cell value. Variants: Int(i64), Str(String), and Id(RowId). Value::Id is the foreign-key reference used by geomerge and any future referencing backend.
Table struct Positional input relation with fixed arity. Produced from a backend scan by scan_as_table. Consumed by query-ops operators.
scan_as_table(&dyn Storage, &str) -> Result<Table, StorageError> function Materialize a relation from a Storage backend into a Table for query-language operators. Row IDs are dropped; only cell values remain.
MemoryStorage struct In-process backend kept in HashMap. Always available; useful for tests and snapshot oracles.
adapters::sqlite::SqliteStorage struct (feat) SQLite-backed Storage, behind the sqlite feature. Uses rusqlite with bundled libsqlite3; supports a single connection with native write transactions.
adapters::redb::RedbStorage struct (feat) Single-file B-tree backed Storage, behind the redb feature. Wraps redb::WriteTransaction for native atomic commits.
adapters::fjall::FjallStorage struct (feat) LSM-tree backed Storage, behind the fjall feature. Each relation gets a partition; transactions buffer inserts and apply them on commit.
adapters::lmdb::LmdbStorage struct (feat) mmap'd B-tree backed Storage, behind the lmdb feature. Wraps heed's RwTxn for native atomic commits.
adapters::geomerge::GeomergeStorage struct (feat) CRDT-backed Storage over the workspace's geomerge crate, behind the geomerge feature. Wraps geomerge::Transaction and resolves pending row IDs via CommittedTx. Deletion is not supported (append-only log). Construct with from_theory, from_store, or with_relations (synthesizes a theory from (name, Vec<ColumnKind>) for callers that lack a typed schema).
adapters::geomerge::ColumnKind enum (feat) Primitive column type fed to GeomergeStorage::with_relations: Int maps to geomerge PrimInt, String maps to PrimString. Exists so callers can synthesize a theory without depending on geolog-lang::ir directly.

Data types and their relationships:

Types

Example

The example below opens an in-memory backend, declares a relation, inserts two rows inside a single transaction, then reads the result.

use storage::value::Value;
use storage::{MemoryStorage, Storage, StorageError};

fn i(x: i64) -> Value {
    Value::Int(x)
}

fn main() -> Result<(), StorageError> {
    let mut storage = MemoryStorage::new();
    storage.create_relation("edge", 2)?;

    let (a, b) = {
        let mut tx = storage.transaction()?;
        let a = tx.insert("edge", vec![i(1), i(2)])?;
        let b = tx.insert("edge", vec![i(2), i(3)])?;
        let committed = tx.commit()?;
        // For KV backends pending IDs equal real IDs, so resolve is the identity.
        (committed.resolve(&a), committed.resolve(&b))
    };

    let rows = storage.scan("edge")?;
    assert_eq!(rows, vec![(a, vec![i(1), i(2)]), (b, vec![i(2), i(3)])]);
    Ok(())
}

Note that we can always swap MemoryStorage for any other adapter (for example adapters::sqlite::SqliteStorage::open(":memory:")?) without changing anything in the code.

How a backend is used (logically):

Workflow

Run the Tests

cargo test -p storage --all-features

Notes

  • Opaque row IDs. A RowId is a backend-assigned byte sequence; callers do not interpret the bytes. KV adapters use big-endian u64; the geomerge adapter encodes a (CommitHash, counter) pair. Hand a RowId back to the same backend to reference an existing row.
  • Pending row IDs. Transaction::insert may return a pending RowId that the backend cannot stabilize until commit; this is the case for geomerge, where the final ID depends on the resulting CommitHash. Resolve such IDs through the CommittedTx returned by commit. For all KV backends the pending ID is already the real one and CommittedTx::resolve is the identity.
  • Streaming first. scan_iter is the primary scan operation; scan defaults to collecting it. In-memory and LSM backends stream natively; B-tree and SQL backends materialize a Vec internally and yield from it to avoid self-referential iterators.
  • Atomic transactions. For storage backends with write transactions support (LMDB, Redb, SQLite, and geomerge) we use their transaction API directly. Adapters without native transaction support (MemoryStorage and Fjall) implement Transaction with an internal buffer of pending operations that are applied on commit. Note that dropping a transaction without calling commit rolls back any pending operations.
  • Deletion support. Most adapters implement delete. The geomerge adapter does not: its append-only commit log returns StorageError::Unsupported("row deletion").
  • Geomerge is alpha. The upstream geomerge crate is prototype-status and its API can change without notice; treat breakage in adapters::geomerge as expected churn rather than regression.
  • Feature gates. MemoryStorage is always available. Every other adapter is feature-gated (lmdb, redb, fjall, sqlite, and geomerge) so callers only pay for what they need.