113 lines
9.0 KiB
Markdown
113 lines
9.0 KiB
Markdown
## Storage
|
|
|
|
This crate is an implementation of a storage access layer.
|
|
It defines an interface for storing and retrieving data from a storage backend, in a generic way.
|
|
Higher-level crates such as `query-ops` should use this crate to access the storage.
|
|
This crates helps with decoupling the query execution logic from the underlying storage implementation.
|
|
|
|
### Public API
|
|
|
|
| Item | Kind | Description |
|
|
|--------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
| `Storage` | trait | Backend-agnostic interface for storing and retrieving rows. Required methods: `create_relation`, `arity`, `scan_iter`, and `transaction`. The rest (`scan`, `scan_where`, `insert`, `delete`) have default implementations. |
|
|
| `Transaction` | trait | Atomic batch of inserts and deletes against a `Storage`. `insert` returns a pending `RowId`; `commit` consumes the boxed transaction and returns a `CommittedTx`; dropping without committing rolls back. |
|
|
| `CommittedTx` | struct | Result of a successful `Transaction::commit`. Resolves pending `RowId`s returned during the transaction to their post-commit form via `resolve`. Empty for KV adapters where pending equals real; populated for `geomerge`. |
|
|
| `StorageError` | enum | Error type returned by every fallible method. Variants: `RelationNotFound`, `RelationExists`, `ArityMismatch`, `Validation`, `Decode`, `Unsupported`, and `Backend`. |
|
|
| `CodecError` | enum | Wire-format failure reported as `StorageError::Decode`. Variants describe truncation, unknown tags, length overruns, and UTF-8 errors. |
|
|
| `RowStream<'a>` | type alias | `Box<dyn Iterator<Item = Result<(RowId, Vec<Value>), StorageError>> + 'a>`. The value yielded by `Storage::scan_iter` and `Storage::scan_where`. |
|
|
| `RowId` | struct | Opaque, backend-assigned row identifier. Bytes are inline up to 36 bytes (covers every encoding the workspace produces today) and spill to the heap otherwise. Construct with `RowId::new(bytes)` or `RowId::from(u64)`. |
|
|
| `Value` | enum | Single cell value. Variants: `Int(i64)`, `Str(String)`, and `Id(RowId)`. `Value::Id` is the foreign-key reference used by `geomerge` and any future referencing backend. |
|
|
| `Table` | struct | Positional input relation with fixed arity. Produced from a backend scan by `scan_as_table`. Consumed by `query-ops` operators. |
|
|
| `scan_as_table(&dyn Storage, &str) -> Result<Table, StorageError>` | function | Materialize a relation from a `Storage` backend into a `Table` for query-language operators. Row IDs are dropped; only cell values remain. |
|
|
| `MemoryStorage` | struct | In-process backend kept in `HashMap`s. Always available; useful for tests and snapshot oracles. |
|
|
| `adapters::sqlite::SqliteStorage` | struct (feat) | `SQLite`-backed `Storage`, behind the `sqlite` feature. Uses `rusqlite` with bundled libsqlite3; supports a single connection with native write transactions. |
|
|
| `adapters::redb::RedbStorage` | struct (feat) | Single-file B-tree backed `Storage`, behind the `redb` feature. Wraps `redb::WriteTransaction` for native atomic commits. |
|
|
| `adapters::fjall::FjallStorage` | struct (feat) | LSM-tree backed `Storage`, behind the `fjall` feature. Each relation gets a partition; transactions buffer inserts and apply them on commit. |
|
|
| `adapters::lmdb::LmdbStorage` | struct (feat) | mmap'd B-tree backed `Storage`, behind the `lmdb` feature. Wraps `heed`'s `RwTxn` for native atomic commits. |
|
|
| `adapters::geomerge::GeomergeStorage` | struct (feat) | CRDT-backed `Storage` over the workspace's `geomerge` crate, behind the `geomerge` feature. Wraps `geomerge::Transaction` and resolves pending row IDs via `CommittedTx`. Deletion is not supported (append-only log). |
|
|
|
|
Data types and their relationships:
|
|
|
|
<div align="center">
|
|
<picture>
|
|
<img alt="Types" src="docs/diagrams/types.svg" height="70%" width="70%">
|
|
</picture>
|
|
</div>
|
|
|
|
### Example
|
|
|
|
The example below opens an in-memory backend, declares a relation, inserts two rows inside a single transaction, then reads the result.
|
|
|
|
```rust
|
|
use storage::value::Value;
|
|
use storage::{MemoryStorage, Storage, StorageError};
|
|
|
|
fn i(x: i64) -> Value {
|
|
Value::Int(x)
|
|
}
|
|
|
|
fn main() -> Result<(), StorageError> {
|
|
let mut storage = MemoryStorage::new();
|
|
storage.create_relation("edge", 2)?;
|
|
|
|
let (a, b) = {
|
|
let mut tx = storage.transaction()?;
|
|
let a = tx.insert("edge", vec![i(1), i(2)])?;
|
|
let b = tx.insert("edge", vec![i(2), i(3)])?;
|
|
let committed = tx.commit()?;
|
|
// For KV backends pending IDs equal real IDs, so resolve is the identity.
|
|
(committed.resolve(&a), committed.resolve(&b))
|
|
};
|
|
|
|
let rows = storage.scan("edge")?;
|
|
assert_eq!(rows, vec![(a, vec![i(1), i(2)]), (b, vec![i(2), i(3)])]);
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
Note that we can always swap `MemoryStorage` for any other adapter (for example `adapters::sqlite::SqliteStorage::open(":memory:")?`) without changing
|
|
anything in the code.
|
|
|
|
How a backend is used (logically):
|
|
|
|
<div align="center">
|
|
<picture>
|
|
<img alt="Workflow" src="docs/diagrams/workflow.svg" height="90%" width="90%">
|
|
</picture>
|
|
</div>
|
|
|
|
### Run the Tests
|
|
|
|
```sh
|
|
cargo test -p storage --all-features
|
|
```
|
|
|
|
### Notes
|
|
|
|
- **Opaque row IDs.**
|
|
A `RowId` is a backend-assigned byte sequence; callers do not interpret the bytes.
|
|
KV adapters use big-endian `u64`; the `geomerge` adapter encodes a `(CommitHash, counter)` pair.
|
|
Hand a `RowId` back to the same backend to reference an existing row.
|
|
- **Pending row IDs.**
|
|
`Transaction::insert` may return a pending `RowId` that the backend cannot stabilize until commit; this is the case for `geomerge`, where the final
|
|
ID depends on the resulting `CommitHash`.
|
|
Resolve such IDs through the `CommittedTx` returned by `commit`.
|
|
For all KV backends the pending ID is already the real one and `CommittedTx::resolve` is the identity.
|
|
- **Streaming first.**
|
|
`scan_iter` is the primary scan operation; `scan` defaults to collecting it.
|
|
In-memory and LSM backends stream natively; B-tree and SQL backends materialize a `Vec` internally and yield from it to avoid self-referential
|
|
iterators.
|
|
- **Atomic transactions.**
|
|
Adapters with native write transactions (LMDB, redb, `SQLite`, `geomerge`) wrap the engine's transaction directly.
|
|
Adapters without (memory, fjall) buffer pending operations and apply them on commit.
|
|
Dropping a transaction without calling `commit` rolls back any pending operations.
|
|
- **Deletion support.**
|
|
Most adapters implement `delete`.
|
|
The `geomerge` adapter does not: its append-only commit log returns `StorageError::Unsupported("row deletion")`.
|
|
- **Geomerge is alpha.**
|
|
The upstream `geomerge` crate is prototype-status and its API may change without notice; treat breakage in `adapters::geomerge` as expected churn
|
|
rather than regression.
|
|
- **Feature gates.**
|
|
`MemoryStorage` is always available.
|
|
Every other adapter is feature-gated (`lmdb`, `redb`, `fjall`, `sqlite`, and `geomerge`) so callers only pay for what they need.
|