Add note file for API and features of storage engine
This commit is contained in:
parent
08ee0992e6
commit
6bf9d38d51
607
storage/001-geomerge-public-api-and-features.md
Normal file
607
storage/001-geomerge-public-api-and-features.md
Normal file
@ -0,0 +1,607 @@
|
||||
# Geomerge Public API and Features
|
||||
|
||||
A reading note on Geomerge's current library API, REPL surface, and implemented storage behavior.
|
||||
|
||||
---
|
||||
|
||||
## Short Answer
|
||||
|
||||
Geomerge is currently a Rust storage-engine prototype for compiled Geolog theories.
|
||||
|
||||
The implemented core is:
|
||||
|
||||
```text
|
||||
compiled FlatTheory
|
||||
-> Store with empty tables and compiled laws
|
||||
-> Add operations or transactions
|
||||
-> schema validation
|
||||
-> law validation
|
||||
-> committed table state
|
||||
```
|
||||
|
||||
It is not yet a full database engine.
|
||||
Native version control, large-scale concurrency, conflict resolution, general query execution, deletes, and updates are not implemented in the current API.
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
**Geomerge**: The storage-engine prototype in this repository. It loads a compiled Geolog theory, creates tables, accepts row additions, checks laws,
|
||||
and can persist and reload store state.
|
||||
|
||||
**Geolog**: The larger database and logic system that Geomerge is intended to support. In this repo, Geolog mainly appears through compiled IR data
|
||||
that Geomerge can load.
|
||||
|
||||
**Compiled Theory**: A ready-to-load description of tables and laws. In code, this is `FlatTheory`.
|
||||
|
||||
**FlatTheory**: The top-level IR object Geomerge consumes. It contains a list of table definitions and a list of law definitions.
|
||||
|
||||
**IR**: Intermediate representation. It is the structured data format shared by `geolog-lang` and `geomerge`, rather than the source syntax a user
|
||||
might write it by hand.
|
||||
|
||||
**Path**: A stable name for a table or law, such as `Graphs`, `G.V`, or `Hom.E.foreignKeys`. Geomerge uses paths in public operations because paths
|
||||
are more portable than internal table ids.
|
||||
|
||||
**Schema**: The column layout of a table. It says how many columns a row has, what type each column expects, and whether the table has a primary key.
|
||||
|
||||
**Table**: A stored relation. Geomerge tables are columnar internally, but users usually think of them as collections of rows.
|
||||
|
||||
**RowId**: A generated identifier for a row. Entity references are represented with row ids.
|
||||
|
||||
**CellValue**: A value stored in one table cell. The current supported values are entity ids, integers, and strings.
|
||||
|
||||
**Op**: A store mutation operation. Currently the only operation is `Op::Add`, which inserts one row into one table.
|
||||
|
||||
**Batch**: A group of operations submitted together with `Store::apply_batch`. The whole batch commits only if schema and law validation succeed.
|
||||
|
||||
**Transaction**: A preview-store mutation that commits only after validation. Transactions are useful when later inserted rows need row ids from
|
||||
earlier inserted rows.
|
||||
|
||||
**Law**: A constraint from the compiled theory. Geomerge checks laws after proposed changes and rejects changes that make a law fail.
|
||||
|
||||
**Binding**: A set of variable assignments found while matching a law antecedent against current table rows.
|
||||
|
||||
**Violation**: A failed law check. Current violations are either missing consequent atoms or unsatisfied equalities.
|
||||
|
||||
**Persistence**: Encoding a store to bytes and decoding it later. Geomerge persists table data, schemas, row ids, and source law entries.
|
||||
|
||||
---
|
||||
|
||||
## Crate Layout
|
||||
|
||||
The workspace has two main crates:
|
||||
|
||||
- `geolog-lang`: shared Geolog IR definitions
|
||||
- `geomerge`: storage engine, table layer, validation, persistence, and REPL
|
||||
|
||||
The `geomerge` crate exports:
|
||||
|
||||
- `ir`: re-exported Geolog IR types from `geolog-lang`
|
||||
- `ops`: mutation operation types
|
||||
- `persist`: binary store persistence
|
||||
- `repl`: command parsing and REPL execution helpers
|
||||
- `solver`: law compilation, binding, matching, and validation
|
||||
- `store`: the main `Store` API
|
||||
- `table`: table storage, cell values, row ids, and validation errors
|
||||
- `transaction`: owned transaction wrapper
|
||||
|
||||
The most important public modules for library users are `ir`, `store`, `table`, `ops`, and `persist`.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Compiled FlatTheory JSON] --> B[geolog-lang IR]
|
||||
B --> C[Store::try_from_theory]
|
||||
|
||||
C --> D[Table Registry]
|
||||
C --> E[Law Compiler]
|
||||
|
||||
D --> F[Table]
|
||||
F --> G[Columnar Cell Storage]
|
||||
F --> H[Generated Row IDs]
|
||||
F --> I[Schema and Primary-Key Validation]
|
||||
|
||||
E --> J[Compiled Laws]
|
||||
J --> K[Law Validator]
|
||||
|
||||
L[Op::Add Batch] --> M[Store::apply_batch]
|
||||
N[Closure Transaction] --> O[Store::transact]
|
||||
P[Owned Transaction] --> Q[OwnedTransaction::commit]
|
||||
|
||||
M --> R[Preview Store Clone]
|
||||
O --> R
|
||||
Q --> R
|
||||
|
||||
R --> I
|
||||
R --> K
|
||||
K --> S{Validation Result}
|
||||
I --> S
|
||||
|
||||
S -->|ok| T[Committed Store]
|
||||
S -->|error| U[Original Store Unchanged]
|
||||
|
||||
T --> V[Store and Table Accessors]
|
||||
T --> W[persist::pst::encode_store]
|
||||
W --> X[Store Bytes]
|
||||
X --> Y[persist::pst::decode_store]
|
||||
Y --> T
|
||||
|
||||
Z[REPL Commands] --> L
|
||||
Z --> N
|
||||
Z --> W
|
||||
```
|
||||
|
||||
The key architectural point is the preview-store boundary.
|
||||
Mutations are applied to a clone, schema and law checks run on that clone, and the original store is replaced only after validation succeeds.
|
||||
|
||||
---
|
||||
|
||||
## IR API
|
||||
|
||||
The IR crate defines the data shape Geomerge consumes.
|
||||
|
||||
Important types:
|
||||
|
||||
- `FlatTheory`: a compiled theory containing table entries and law entries
|
||||
- `TableEntry`: a table path plus schema
|
||||
- `LawEntry`: a law path plus law definition
|
||||
- `Schema`: column types and an optional primary key
|
||||
- `ColType`: entity, primitive, or tuple column type
|
||||
- `PrimType`: `int` or `string`
|
||||
- `Path`: dotted table and law paths such as `Graphs`, `G.V`, or `Hom.E.foreignKeys`
|
||||
- `Law`: variables, antecedent proposition, and consequent proposition
|
||||
- `Prop`: atom, equality, conjunction, or disjunction at the IR level
|
||||
- `Term`: literal, variable, projection, or constructor at the IR level
|
||||
|
||||
The IR is broader than the current store and solver implementation.
|
||||
For example, tuple columns, disjunction, projections, and constructors exist in the IR, but they are not fully supported by the current storage and validation path.
|
||||
|
||||
---
|
||||
|
||||
## Store API
|
||||
|
||||
`Store` is the central API.
|
||||
|
||||
The usual construction path is:
|
||||
|
||||
```rust
|
||||
let theory: geomerge::ir::FlatTheory = serde_json::from_str(input)?;
|
||||
let mut store = geomerge::store::Store::try_from_theory(theory)?;
|
||||
```
|
||||
|
||||
`Store::try_from_theory` builds one empty table for each table in the theory and compiles the theory laws into solver form.
|
||||
|
||||
Main accessors:
|
||||
|
||||
- `Store::new`: empty store construction
|
||||
- `Store::try_from_theory`: store construction from `FlatTheory`
|
||||
- `Store::tables`: table iterator
|
||||
- `Store::table_count`: table count
|
||||
- `Store::resolve_table`: table path to table oid lookup
|
||||
- `Store::table`: table lookup by oid
|
||||
- `Store::table_mut`: mutable table lookup by oid
|
||||
- `Store::table_at`: table lookup by path
|
||||
- `Store::table_at_mut`: mutable table lookup by path
|
||||
- `Store::laws`: compiled law access
|
||||
- `Store::dump`: debug rendering of all tables
|
||||
- `Store::check_laws`: full law validation over current store contents
|
||||
|
||||
`Store::insert_table` is public, but the project notes indicate schema-driven table creation is the intended normal path.
|
||||
|
||||
---
|
||||
|
||||
## Public API Examples
|
||||
|
||||
These examples show the API shape.
|
||||
A concrete theory can require related rows to be inserted together so that laws hold at commit time.
|
||||
|
||||
### Loading a Theory
|
||||
|
||||
```rust
|
||||
use geomerge::{
|
||||
ir::FlatTheory,
|
||||
store::Store,
|
||||
};
|
||||
|
||||
fn load_theory(input: &str) -> Result<Store, Box<dyn std::error::Error>> {
|
||||
let theory: FlatTheory = serde_json::from_str(input)?;
|
||||
let store = Store::try_from_theory(theory)?;
|
||||
Ok(store)
|
||||
}
|
||||
```
|
||||
|
||||
This is the normal entry point for library use.
|
||||
The theory supplies table schemas and laws; the resulting store starts with empty tables.
|
||||
|
||||
### Adding Rows with `apply_batch`
|
||||
|
||||
```rust
|
||||
use geomerge::{
|
||||
ir::Path,
|
||||
store::Store,
|
||||
table::CellValue,
|
||||
};
|
||||
|
||||
fn add_graph(store: &mut Store) -> Result<u64, Box<geomerge::store::StoreIntError>> {
|
||||
let graphs = store
|
||||
.table_at(&Path::from("Graphs"))
|
||||
.expect("Graphs table exists");
|
||||
|
||||
let op = graphs.add(vec![]);
|
||||
let row_ids = store.apply_batch(vec![op])?;
|
||||
|
||||
Ok(row_ids[0])
|
||||
}
|
||||
|
||||
fn add_vertex(
|
||||
store: &mut Store,
|
||||
graph_id: u64,
|
||||
) -> Result<u64, Box<geomerge::store::StoreIntError>> {
|
||||
let vertices = store
|
||||
.table_at(&Path::from("G.V"))
|
||||
.expect("G.V table exists");
|
||||
|
||||
let op = vertices.add(vec![CellValue::Id(graph_id)]);
|
||||
let row_ids = store.apply_batch(vec![op])?;
|
||||
|
||||
Ok(row_ids[0])
|
||||
}
|
||||
```
|
||||
|
||||
`Table::add` constructs an operation. `Store::apply_batch` is the call that validates and commits it.
|
||||
|
||||
### Adding Related Rows in One Transaction
|
||||
|
||||
```rust
|
||||
use geomerge::{
|
||||
ir::Path,
|
||||
store::{Store, StoreIntError},
|
||||
table::CellValue,
|
||||
};
|
||||
|
||||
fn add_graph_with_vertices(store: &mut Store) -> Result<(u64, u64, u64), Box<StoreIntError>> {
|
||||
store.transact(|preview| {
|
||||
let graph_id = preview
|
||||
.table_at_mut(&Path::from("Graphs"))
|
||||
.expect("Graphs table exists")
|
||||
.append_row_validated(vec![])?;
|
||||
|
||||
let v0 = preview
|
||||
.table_at_mut(&Path::from("G.V"))
|
||||
.expect("G.V table exists")
|
||||
.append_row_validated(vec![CellValue::Id(graph_id)])?;
|
||||
|
||||
let v1 = preview
|
||||
.table_at_mut(&Path::from("G.V"))
|
||||
.expect("G.V table exists")
|
||||
.append_row_validated(vec![CellValue::Id(graph_id)])?;
|
||||
|
||||
Ok((graph_id, v0, v1))
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
This style is useful when later rows need row ids produced earlier in the same transaction.
|
||||
|
||||
### Reading Table Contents
|
||||
|
||||
```rust
|
||||
use geomerge::{
|
||||
ir::Path,
|
||||
store::Store,
|
||||
};
|
||||
|
||||
fn print_table_cells(store: &Store) {
|
||||
let table = store
|
||||
.table_at(&Path::from("G.V"))
|
||||
.expect("G.V table exists");
|
||||
|
||||
for row_idx in 0..table.row_count() {
|
||||
let row_id = table.row_id_at(row_idx).expect("row id exists");
|
||||
let graph = table.cell_at(row_idx, 0).expect("graph column exists");
|
||||
println!("row #{row_id}: graph={graph}");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The table API exposes physical row indexes, generated row ids, and per-column cell access. It does not expose a general query language.
|
||||
|
||||
### Explicit Law Checking
|
||||
|
||||
```rust
|
||||
use geomerge::store::{Store, StoreIntError};
|
||||
|
||||
fn validate_store(store: &Store) -> Result<(), Box<StoreIntError>> {
|
||||
store.check_laws()
|
||||
}
|
||||
```
|
||||
|
||||
`Store::apply_batch` and `Store::transact` already call `check_laws` before committing. Calling it directly is useful after manual store construction
|
||||
or debugging.
|
||||
|
||||
### Persistence Round Trip
|
||||
|
||||
```rust
|
||||
use geomerge::{
|
||||
persist::pst::{decode_store, encode_store},
|
||||
store::Store,
|
||||
};
|
||||
|
||||
fn round_trip(store: &Store) -> Result<Store, geomerge::persist::error::PersisError> {
|
||||
let bytes = encode_store(store)?;
|
||||
decode_store(&bytes)
|
||||
}
|
||||
```
|
||||
|
||||
Decoded stores reconstruct table state and recompile laws from the persisted law entries.
|
||||
|
||||
---
|
||||
|
||||
## Mutation API
|
||||
|
||||
The mutation operation type currently has one variant:
|
||||
|
||||
```rust
|
||||
Op::Add {
|
||||
table: ir::Path,
|
||||
values: Vec<CellValue>,
|
||||
}
|
||||
```
|
||||
|
||||
The table path is used instead of an internal table id, which keeps operations stable across stores.
|
||||
|
||||
The most direct insertion flow is:
|
||||
|
||||
```rust
|
||||
let table = store.table_at(&geomerge::ir::Path::from("G.V")).unwrap();
|
||||
let op = table.add(vec![geomerge::table::CellValue::Id(0)]);
|
||||
let row_ids = store.apply_batch(vec![op])?;
|
||||
```
|
||||
|
||||
`Store::apply_batch` validates the entire batch first. It then applies operations to a cloned preview store, checks all laws, and commits the preview
|
||||
only if validation succeeds. On validation failure, the original store is unchanged.
|
||||
|
||||
The returned `Vec<RowId>` is in the same order as the input operations.
|
||||
|
||||
---
|
||||
|
||||
## Table API
|
||||
|
||||
Tables are columnar stores. A table owns:
|
||||
|
||||
- its `Path`
|
||||
- its `Schema`
|
||||
- generated row ids
|
||||
- one column vector per schema column
|
||||
|
||||
Rows are identified by generated `RowId` values. Row ids are managed by the store layer and are exposed as entity references.
|
||||
|
||||
Main table methods:
|
||||
|
||||
- `Table::new`: table construction from path and schema
|
||||
- `Table::schema`: schema access
|
||||
- `Table::path`: path access
|
||||
- `Table::row_count`: row count
|
||||
- `Table::row_id_at`: row id at a physical row index
|
||||
- `Table::cell_at`: cell lookup by physical row index and column index
|
||||
- `Table::dump`: debug rendering
|
||||
- `Table::validate`: schema validation for a candidate row
|
||||
- `Table::primary_key_values`: primary-key extraction from candidate values
|
||||
- `Table::validate_new_row`: schema and primary-key validation
|
||||
- `Table::append_row_validated`: direct validated append
|
||||
- `Table::add`: operation construction for `Store::apply_batch`
|
||||
|
||||
Library code should prefer `Store::apply_batch` or `Store::transact` for durable changes, because those paths also check laws.
|
||||
|
||||
---
|
||||
|
||||
## Cell Values and Validation
|
||||
|
||||
The supported cell values are:
|
||||
|
||||
```rust
|
||||
CellValue::Id(RowId)
|
||||
CellValue::Int(i64)
|
||||
CellValue::Str(String)
|
||||
```
|
||||
|
||||
Validation checks:
|
||||
|
||||
- column count
|
||||
- entity column values as `CellValue::Id`
|
||||
- integer columns as `CellValue::Int`
|
||||
- string columns as `CellValue::Str`
|
||||
- tuple column rejection
|
||||
- primary-key duplication
|
||||
- unknown table paths in batch application
|
||||
|
||||
Primary-key behavior:
|
||||
|
||||
- `None`: no primary-key uniqueness check
|
||||
- `Some(columns)`: uniqueness over listed columns
|
||||
- `Some([])`: singleton table, meaning at most one row
|
||||
|
||||
---
|
||||
|
||||
## Transactions
|
||||
|
||||
Geomerge has two transaction styles.
|
||||
|
||||
`Store::transact` takes a closure over a preview store:
|
||||
|
||||
```rust
|
||||
store.transact(|preview| {
|
||||
let table = preview.table_at_mut(&Path::from("Graphs")).unwrap();
|
||||
let row_id = table.append_row_validated(vec![])?;
|
||||
Ok(row_id)
|
||||
})?;
|
||||
```
|
||||
|
||||
The closure can perform multiple direct table appends.
|
||||
After the closure succeeds, Geomerge checks laws on the preview and commits only if they hold.
|
||||
|
||||
`Store::into_transaction` consumes a store and returns an `OwnedTransaction`.
|
||||
Its `commit` method returns either:
|
||||
|
||||
```text
|
||||
Ok((output, committed_store))
|
||||
Err((error, original_store))
|
||||
```
|
||||
|
||||
That shape is useful when the caller needs ownership of the original store after a failed transaction.
|
||||
|
||||
One implementation detail to watch: `Store::apply_batch` checks primary-key conflicts within the whole batch before applying. The closure-based
|
||||
transaction path relies on table-level validated appends and has a TODO around primary-key checking at the store transaction layer.
|
||||
|
||||
---
|
||||
|
||||
## Law Solver
|
||||
|
||||
The solver compiles IR laws into a restricted execution form.
|
||||
|
||||
Supported compiled proposition forms:
|
||||
|
||||
- atom
|
||||
- equality
|
||||
- conjunction
|
||||
|
||||
Unsupported or incomplete forms include:
|
||||
|
||||
- disjunction
|
||||
- projected terms
|
||||
- constructed terms
|
||||
- tuple values
|
||||
|
||||
Runtime law checking has two phases:
|
||||
|
||||
1. Antecedent binding: scan relevant tables and produce variable bindings.
|
||||
2. Consequent validation: check that required atoms and equalities hold for each binding.
|
||||
|
||||
Law violations identify:
|
||||
|
||||
- missing consequent atoms
|
||||
- unsatisfied consequent equalities
|
||||
- the law involved
|
||||
- the binding that triggered the failure
|
||||
|
||||
The solver is direct and full-store oriented. It is not incremental.
|
||||
|
||||
---
|
||||
|
||||
## Persistence API
|
||||
|
||||
The persistence API is under `persist::pst`.
|
||||
|
||||
Main functions:
|
||||
|
||||
- `encode_store(&Store) -> Result<Vec<u8>, PersisError>`
|
||||
- `decode_store(&[u8]) -> Result<Store, PersisError>`
|
||||
|
||||
The persisted data includes:
|
||||
|
||||
- format magic and version
|
||||
- next table oid
|
||||
- table metadata
|
||||
- table schemas
|
||||
- row ids
|
||||
- column payloads
|
||||
- source law entries
|
||||
|
||||
On decode, Geomerge reconstructs tables and recompiles laws from the persisted law entries.
|
||||
|
||||
---
|
||||
|
||||
## REPL API
|
||||
|
||||
The binary starts the REPL. The REPL commands are:
|
||||
|
||||
```text
|
||||
/help
|
||||
/exit
|
||||
/quit
|
||||
load-schema <path>;
|
||||
load-store <path>;
|
||||
list-schema;
|
||||
add <table> values (...), (...);
|
||||
dump-table <table>;
|
||||
dump-store;
|
||||
persist <path>;
|
||||
begin transact; name = add <table> values (...); ... commit;
|
||||
```
|
||||
|
||||
REPL transaction bindings let one inserted row refer to another row inserted earlier in the same transaction:
|
||||
|
||||
```text
|
||||
begin transact;
|
||||
g = add Graphs values ();
|
||||
v = add G.V values (g);
|
||||
commit;
|
||||
```
|
||||
|
||||
For non-transactional `add`, entity values use `#id` syntax. Inside transaction blocks, entity columns can use either `#id` or a previous binding
|
||||
name.
|
||||
|
||||
---
|
||||
|
||||
## Key Features
|
||||
|
||||
Implemented features:
|
||||
|
||||
- compiled theory loading from JSON-compatible IR
|
||||
- schema-defined table creation
|
||||
- columnar row storage
|
||||
- generated row ids
|
||||
- entity references through row ids
|
||||
- primitive int and string cells
|
||||
- table path lookup
|
||||
- validated row insertion
|
||||
- batch add operations
|
||||
- atomic batch commit after validation
|
||||
- closure-based preview transactions
|
||||
- owned transactions with original-store recovery on failure
|
||||
- primary-key validation for batch adds and validated table appends
|
||||
- law compilation for atoms, equality, and conjunction
|
||||
- full-store law checking
|
||||
- binary persistence and reload
|
||||
- REPL schema loading, insertion, dumping, transactions, persistence, and reload
|
||||
|
||||
Not implemented or limited:
|
||||
|
||||
- deletes
|
||||
- updates
|
||||
- general query execution
|
||||
- incremental validation
|
||||
- native version-control semantics
|
||||
- merge and conflict-resolution semantics
|
||||
- concurrency control
|
||||
- tuple cell insertion
|
||||
- disjunction in compiled laws
|
||||
- projected or constructed terms in compiled laws
|
||||
- ad hoc indexes for efficient law checking
|
||||
|
||||
---
|
||||
|
||||
## Practical Mental Model
|
||||
|
||||
Geomerge currently behaves like:
|
||||
|
||||
```text
|
||||
compiled Geolog theory
|
||||
-> typed table registry
|
||||
-> validated row additions
|
||||
-> full-store law checks
|
||||
-> persisted store snapshots
|
||||
```
|
||||
|
||||
The most relevant integration boundary for an incremental engine such as DBSP is law checking.
|
||||
Today, Geomerge validates laws by scanning the current store.
|
||||
A future incremental layer could maintain violation queries and let the store ask whether any violations exist after a proposed transaction.
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
* **May 13, 2026** -- First version of this document.
|
||||
Loading…
x
Reference in New Issue
Block a user