112 lines
9.0 KiB
Markdown
Raw Normal View History

2026-06-04 12:47:47 +02:00
## Storage
This crate is the storage layer of the workspace.
It defines a backend-agnostic `Storage` trait, the row, value, and identifier types that travel through it, and adapter modules that implement the
trait over different engines.
Higher-level crates such as `query-ops` depend on this crate for both the types and the trait.
### Public API
| Item | Kind | Description |
|--------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `Storage` | trait | Backend-agnostic interface for storing and retrieving rows. Required methods: `create_relation`, `arity`, `scan_iter`, and `transaction`. The rest (`scan`, `scan_where`, `insert`, `delete`) have default implementations. |
| `Transaction` | trait | Atomic batch of inserts and deletes against a `Storage`. `insert` returns a pending `RowId`; `commit` consumes the boxed transaction and returns a `CommittedTx`; dropping without committing rolls back. |
| `CommittedTx` | struct | Result of a successful `Transaction::commit`. Resolves pending `RowId`s returned during the transaction to their post-commit form via `resolve`. Empty for KV adapters where pending equals real; populated for `geomerge`. |
| `StorageError` | enum | Error type returned by every fallible method. Variants: `RelationNotFound`, `RelationExists`, `ArityMismatch`, `Validation`, `Decode`, `Unsupported`, and `Backend`. |
| `CodecError` | enum | Wire-format failure reported as `StorageError::Decode`. Variants describe truncation, unknown tags, length overruns, and UTF-8 errors. |
| `RowStream<'a>` | type alias | `Box<dyn Iterator<Item = Result<(RowId, Vec<Value>), StorageError>> + 'a>`. The value yielded by `Storage::scan_iter` and `Storage::scan_where`. |
| `RowId` | struct | Opaque, backend-assigned row identifier. Bytes are inline up to 36 bytes (covers every encoding the workspace produces today) and spill to the heap otherwise. Construct with `RowId::new(bytes)` or `RowId::from(u64)`. |
| `Value` | enum | Single cell value. Variants: `Int(i64)`, `Str(String)`, and `Id(RowId)`. `Value::Id` is the foreign-key reference used by `geomerge` and any future referencing backend. |
| `Table` | struct | Positional input relation with fixed arity. Produced from a backend scan by `scan_as_table`. Consumed by `query-ops` operators. |
| `scan_as_table(&dyn Storage, &str) -> Result<Table, StorageError>` | function | Materialize a relation from a `Storage` backend into a `Table` for query-language operators. Row IDs are dropped; only cell values remain. |
| `MemoryStorage` | struct | In-process backend kept in `HashMap`s. Always available; useful for tests and snapshot oracles. |
| `adapters::sqlite::SqliteStorage` | struct (feat) | `SQLite`-backed `Storage`, behind the `sqlite` feature. Uses `rusqlite` with bundled libsqlite3; supports a single connection with native write transactions. |
| `adapters::redb::RedbStorage` | struct (feat) | Single-file B-tree backed `Storage`, behind the `redb` feature. Wraps `redb::WriteTransaction` for native atomic commits. |
| `adapters::fjall::FjallStorage` | struct (feat) | LSM-tree backed `Storage`, behind the `fjall` feature. Each relation gets a partition; transactions buffer inserts and apply them on commit. |
| `adapters::lmdb::LmdbStorage` | struct (feat) | mmap'd B-tree backed `Storage`, behind the `lmdb` feature. Wraps `heed`'s `RwTxn` for native atomic commits. |
| `adapters::geomerge::GeomergeStorage` | struct (feat) | CRDT-backed `Storage` over the workspace's `geomerge` crate, behind the `geomerge` feature. Wraps `geomerge::Transaction` and resolves pending row IDs via `CommittedTx`. Deletion is not supported (append-only log). |
Data types and their relationships:
<div align="center">
<picture>
<img alt="Types" src="docs/diagrams/types.svg" height="70%" width="70%">
</picture>
</div>
### Example
The example below opens an in-memory backend, declares a relation, inserts two rows inside a single transaction, then scans the result.
```rust
use storage::value::Value;
use storage::{MemoryStorage, Storage, StorageError};
fn i(x: i64) -> Value {
Value::Int(x)
}
fn main() -> Result<(), StorageError> {
let mut storage = MemoryStorage::new();
storage.create_relation("edge", 2)?;
let (a, b) = {
let mut tx = storage.transaction()?;
let a = tx.insert("edge", vec![i(1), i(2)])?;
let b = tx.insert("edge", vec![i(2), i(3)])?;
let committed = tx.commit()?;
// For KV backends pending IDs equal real IDs, so resolve is the identity.
(committed.resolve(&a), committed.resolve(&b))
};
let rows = storage.scan("edge")?;
assert_eq!(rows, vec![(a, vec![i(1), i(2)]), (b, vec![i(2), i(3)])]);
Ok(())
}
```
Swapping `MemoryStorage` for any other adapter (for example `adapters::sqlite::SqliteStorage::open(":memory:")?`) requires no other code changes.
How a backend is used (logically):
<div align="center">
<picture>
<img alt="Workflow" src="docs/diagrams/workflow.svg" height="90%" width="90%">
</picture>
</div>
### Run the Tests
```sh
cargo test -p storage --all-features
```
### Notes
- **Opaque row IDs.**
A `RowId` is a backend-assigned byte sequence; callers do not interpret the bytes.
KV adapters use big-endian `u64`; the `geomerge` adapter encodes a `(CommitHash, counter)` pair.
Hand a `RowId` back to the same backend to reference an existing row.
- **Pending row IDs.**
`Transaction::insert` may return a pending `RowId` that the backend cannot stabilize until commit; this is the case for `geomerge`, where the final
ID depends on the resulting `CommitHash`.
Resolve such IDs through the `CommittedTx` returned by `commit`.
For all KV backends the pending ID is already the real one and `CommittedTx::resolve` is the identity.
- **Streaming first.**
`scan_iter` is the primary scan operation; `scan` defaults to collecting it.
In-memory and LSM backends stream natively; B-tree and SQL backends materialize a `Vec` internally and yield from it to avoid self-referential
iterators.
- **Atomic transactions.**
Adapters with native write transactions (LMDB, redb, `SQLite`, `geomerge`) wrap the engine's transaction directly.
Adapters without (memory, fjall) buffer pending operations and apply them on commit.
Dropping a transaction without calling `commit` rolls back any pending operations.
- **Deletion support.**
Most adapters implement `delete`.
The `geomerge` adapter does not: its append-only commit log returns `StorageError::Unsupported("row deletion")`.
- **Geomerge is alpha.**
The upstream `geomerge` crate is prototype-status and its API may change without notice; treat breakage in `adapters::geomerge` as expected churn
rather than regression.
- **Feature gates.**
`MemoryStorage` is always available.
Every other adapter is feature-gated (`lmdb`, `redb`, `fjall`, `sqlite`, `geomerge`) so callers only pay for what they need.