From 6bf9d38d51780156bfd4532f2f9a07b0c0a85b96 Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Wed, 13 May 2026 10:17:07 +0200 Subject: [PATCH] Add note file for API and features of storage engine --- .../001-geomerge-public-api-and-features.md | 607 ++++++++++++++++++ 1 file changed, 607 insertions(+) create mode 100644 storage/001-geomerge-public-api-and-features.md diff --git a/storage/001-geomerge-public-api-and-features.md b/storage/001-geomerge-public-api-and-features.md new file mode 100644 index 0000000..f5b3ef6 --- /dev/null +++ b/storage/001-geomerge-public-api-and-features.md @@ -0,0 +1,607 @@ +# Geomerge Public API and Features + +A reading note on Geomerge's current library API, REPL surface, and implemented storage behavior. + +--- + +## Short Answer + +Geomerge is currently a Rust storage-engine prototype for compiled Geolog theories. + +The implemented core is: + +```text +compiled FlatTheory +-> Store with empty tables and compiled laws +-> Add operations or transactions +-> schema validation +-> law validation +-> committed table state +``` + +It is not yet a full database engine. +Native version control, large-scale concurrency, conflict resolution, general query execution, deletes, and updates are not implemented in the current API. + +--- + +## Terminology + +**Geomerge**: The storage-engine prototype in this repository. It loads a compiled Geolog theory, creates tables, accepts row additions, checks laws, +and can persist and reload store state. + +**Geolog**: The larger database and logic system that Geomerge is intended to support. In this repo, Geolog mainly appears through compiled IR data +that Geomerge can load. + +**Compiled Theory**: A ready-to-load description of tables and laws. In code, this is `FlatTheory`. + +**FlatTheory**: The top-level IR object Geomerge consumes. It contains a list of table definitions and a list of law definitions. + +**IR**: Intermediate representation. It is the structured data format shared by `geolog-lang` and `geomerge`, rather than the source syntax a user +might write it by hand. + +**Path**: A stable name for a table or law, such as `Graphs`, `G.V`, or `Hom.E.foreignKeys`. Geomerge uses paths in public operations because paths +are more portable than internal table ids. + +**Schema**: The column layout of a table. It says how many columns a row has, what type each column expects, and whether the table has a primary key. + +**Table**: A stored relation. Geomerge tables are columnar internally, but users usually think of them as collections of rows. + +**RowId**: A generated identifier for a row. Entity references are represented with row ids. + +**CellValue**: A value stored in one table cell. The current supported values are entity ids, integers, and strings. + +**Op**: A store mutation operation. Currently the only operation is `Op::Add`, which inserts one row into one table. + +**Batch**: A group of operations submitted together with `Store::apply_batch`. The whole batch commits only if schema and law validation succeed. + +**Transaction**: A preview-store mutation that commits only after validation. Transactions are useful when later inserted rows need row ids from +earlier inserted rows. + +**Law**: A constraint from the compiled theory. Geomerge checks laws after proposed changes and rejects changes that make a law fail. + +**Binding**: A set of variable assignments found while matching a law antecedent against current table rows. + +**Violation**: A failed law check. Current violations are either missing consequent atoms or unsatisfied equalities. + +**Persistence**: Encoding a store to bytes and decoding it later. Geomerge persists table data, schemas, row ids, and source law entries. + +--- + +## Crate Layout + +The workspace has two main crates: + +- `geolog-lang`: shared Geolog IR definitions +- `geomerge`: storage engine, table layer, validation, persistence, and REPL + +The `geomerge` crate exports: + +- `ir`: re-exported Geolog IR types from `geolog-lang` +- `ops`: mutation operation types +- `persist`: binary store persistence +- `repl`: command parsing and REPL execution helpers +- `solver`: law compilation, binding, matching, and validation +- `store`: the main `Store` API +- `table`: table storage, cell values, row ids, and validation errors +- `transaction`: owned transaction wrapper + +The most important public modules for library users are `ir`, `store`, `table`, `ops`, and `persist`. + +--- + +## Architecture Diagram + +```mermaid +flowchart TD + A[Compiled FlatTheory JSON] --> B[geolog-lang IR] + B --> C[Store::try_from_theory] + + C --> D[Table Registry] + C --> E[Law Compiler] + + D --> F[Table] + F --> G[Columnar Cell Storage] + F --> H[Generated Row IDs] + F --> I[Schema and Primary-Key Validation] + + E --> J[Compiled Laws] + J --> K[Law Validator] + + L[Op::Add Batch] --> M[Store::apply_batch] + N[Closure Transaction] --> O[Store::transact] + P[Owned Transaction] --> Q[OwnedTransaction::commit] + + M --> R[Preview Store Clone] + O --> R + Q --> R + + R --> I + R --> K + K --> S{Validation Result} + I --> S + + S -->|ok| T[Committed Store] + S -->|error| U[Original Store Unchanged] + + T --> V[Store and Table Accessors] + T --> W[persist::pst::encode_store] + W --> X[Store Bytes] + X --> Y[persist::pst::decode_store] + Y --> T + + Z[REPL Commands] --> L + Z --> N + Z --> W +``` + +The key architectural point is the preview-store boundary. +Mutations are applied to a clone, schema and law checks run on that clone, and the original store is replaced only after validation succeeds. + +--- + +## IR API + +The IR crate defines the data shape Geomerge consumes. + +Important types: + +- `FlatTheory`: a compiled theory containing table entries and law entries +- `TableEntry`: a table path plus schema +- `LawEntry`: a law path plus law definition +- `Schema`: column types and an optional primary key +- `ColType`: entity, primitive, or tuple column type +- `PrimType`: `int` or `string` +- `Path`: dotted table and law paths such as `Graphs`, `G.V`, or `Hom.E.foreignKeys` +- `Law`: variables, antecedent proposition, and consequent proposition +- `Prop`: atom, equality, conjunction, or disjunction at the IR level +- `Term`: literal, variable, projection, or constructor at the IR level + +The IR is broader than the current store and solver implementation. +For example, tuple columns, disjunction, projections, and constructors exist in the IR, but they are not fully supported by the current storage and validation path. + +--- + +## Store API + +`Store` is the central API. + +The usual construction path is: + +```rust +let theory: geomerge::ir::FlatTheory = serde_json::from_str(input)?; +let mut store = geomerge::store::Store::try_from_theory(theory)?; +``` + +`Store::try_from_theory` builds one empty table for each table in the theory and compiles the theory laws into solver form. + +Main accessors: + +- `Store::new`: empty store construction +- `Store::try_from_theory`: store construction from `FlatTheory` +- `Store::tables`: table iterator +- `Store::table_count`: table count +- `Store::resolve_table`: table path to table oid lookup +- `Store::table`: table lookup by oid +- `Store::table_mut`: mutable table lookup by oid +- `Store::table_at`: table lookup by path +- `Store::table_at_mut`: mutable table lookup by path +- `Store::laws`: compiled law access +- `Store::dump`: debug rendering of all tables +- `Store::check_laws`: full law validation over current store contents + +`Store::insert_table` is public, but the project notes indicate schema-driven table creation is the intended normal path. + +--- + +## Public API Examples + +These examples show the API shape. +A concrete theory can require related rows to be inserted together so that laws hold at commit time. + +### Loading a Theory + +```rust +use geomerge::{ + ir::FlatTheory, + store::Store, +}; + +fn load_theory(input: &str) -> Result> { + let theory: FlatTheory = serde_json::from_str(input)?; + let store = Store::try_from_theory(theory)?; + Ok(store) +} +``` + +This is the normal entry point for library use. +The theory supplies table schemas and laws; the resulting store starts with empty tables. + +### Adding Rows with `apply_batch` + +```rust +use geomerge::{ + ir::Path, + store::Store, + table::CellValue, +}; + +fn add_graph(store: &mut Store) -> Result> { + let graphs = store + .table_at(&Path::from("Graphs")) + .expect("Graphs table exists"); + + let op = graphs.add(vec![]); + let row_ids = store.apply_batch(vec![op])?; + + Ok(row_ids[0]) +} + +fn add_vertex( + store: &mut Store, + graph_id: u64, +) -> Result> { + let vertices = store + .table_at(&Path::from("G.V")) + .expect("G.V table exists"); + + let op = vertices.add(vec![CellValue::Id(graph_id)]); + let row_ids = store.apply_batch(vec![op])?; + + Ok(row_ids[0]) +} +``` + +`Table::add` constructs an operation. `Store::apply_batch` is the call that validates and commits it. + +### Adding Related Rows in One Transaction + +```rust +use geomerge::{ + ir::Path, + store::{Store, StoreIntError}, + table::CellValue, +}; + +fn add_graph_with_vertices(store: &mut Store) -> Result<(u64, u64, u64), Box> { + store.transact(|preview| { + let graph_id = preview + .table_at_mut(&Path::from("Graphs")) + .expect("Graphs table exists") + .append_row_validated(vec![])?; + + let v0 = preview + .table_at_mut(&Path::from("G.V")) + .expect("G.V table exists") + .append_row_validated(vec![CellValue::Id(graph_id)])?; + + let v1 = preview + .table_at_mut(&Path::from("G.V")) + .expect("G.V table exists") + .append_row_validated(vec![CellValue::Id(graph_id)])?; + + Ok((graph_id, v0, v1)) + }) +} +``` + +This style is useful when later rows need row ids produced earlier in the same transaction. + +### Reading Table Contents + +```rust +use geomerge::{ + ir::Path, + store::Store, +}; + +fn print_table_cells(store: &Store) { + let table = store + .table_at(&Path::from("G.V")) + .expect("G.V table exists"); + + for row_idx in 0..table.row_count() { + let row_id = table.row_id_at(row_idx).expect("row id exists"); + let graph = table.cell_at(row_idx, 0).expect("graph column exists"); + println!("row #{row_id}: graph={graph}"); + } +} +``` + +The table API exposes physical row indexes, generated row ids, and per-column cell access. It does not expose a general query language. + +### Explicit Law Checking + +```rust +use geomerge::store::{Store, StoreIntError}; + +fn validate_store(store: &Store) -> Result<(), Box> { + store.check_laws() +} +``` + +`Store::apply_batch` and `Store::transact` already call `check_laws` before committing. Calling it directly is useful after manual store construction +or debugging. + +### Persistence Round Trip + +```rust +use geomerge::{ + persist::pst::{decode_store, encode_store}, + store::Store, +}; + +fn round_trip(store: &Store) -> Result { + let bytes = encode_store(store)?; + decode_store(&bytes) +} +``` + +Decoded stores reconstruct table state and recompile laws from the persisted law entries. + +--- + +## Mutation API + +The mutation operation type currently has one variant: + +```rust +Op::Add { + table: ir::Path, + values: Vec, +} +``` + +The table path is used instead of an internal table id, which keeps operations stable across stores. + +The most direct insertion flow is: + +```rust +let table = store.table_at(&geomerge::ir::Path::from("G.V")).unwrap(); +let op = table.add(vec![geomerge::table::CellValue::Id(0)]); +let row_ids = store.apply_batch(vec![op])?; +``` + +`Store::apply_batch` validates the entire batch first. It then applies operations to a cloned preview store, checks all laws, and commits the preview +only if validation succeeds. On validation failure, the original store is unchanged. + +The returned `Vec` is in the same order as the input operations. + +--- + +## Table API + +Tables are columnar stores. A table owns: + +- its `Path` +- its `Schema` +- generated row ids +- one column vector per schema column + +Rows are identified by generated `RowId` values. Row ids are managed by the store layer and are exposed as entity references. + +Main table methods: + +- `Table::new`: table construction from path and schema +- `Table::schema`: schema access +- `Table::path`: path access +- `Table::row_count`: row count +- `Table::row_id_at`: row id at a physical row index +- `Table::cell_at`: cell lookup by physical row index and column index +- `Table::dump`: debug rendering +- `Table::validate`: schema validation for a candidate row +- `Table::primary_key_values`: primary-key extraction from candidate values +- `Table::validate_new_row`: schema and primary-key validation +- `Table::append_row_validated`: direct validated append +- `Table::add`: operation construction for `Store::apply_batch` + +Library code should prefer `Store::apply_batch` or `Store::transact` for durable changes, because those paths also check laws. + +--- + +## Cell Values and Validation + +The supported cell values are: + +```rust +CellValue::Id(RowId) +CellValue::Int(i64) +CellValue::Str(String) +``` + +Validation checks: + +- column count +- entity column values as `CellValue::Id` +- integer columns as `CellValue::Int` +- string columns as `CellValue::Str` +- tuple column rejection +- primary-key duplication +- unknown table paths in batch application + +Primary-key behavior: + +- `None`: no primary-key uniqueness check +- `Some(columns)`: uniqueness over listed columns +- `Some([])`: singleton table, meaning at most one row + +--- + +## Transactions + +Geomerge has two transaction styles. + +`Store::transact` takes a closure over a preview store: + +```rust +store.transact(|preview| { + let table = preview.table_at_mut(&Path::from("Graphs")).unwrap(); + let row_id = table.append_row_validated(vec![])?; + Ok(row_id) +})?; +``` + +The closure can perform multiple direct table appends. +After the closure succeeds, Geomerge checks laws on the preview and commits only if they hold. + +`Store::into_transaction` consumes a store and returns an `OwnedTransaction`. +Its `commit` method returns either: + +```text +Ok((output, committed_store)) +Err((error, original_store)) +``` + +That shape is useful when the caller needs ownership of the original store after a failed transaction. + +One implementation detail to watch: `Store::apply_batch` checks primary-key conflicts within the whole batch before applying. The closure-based +transaction path relies on table-level validated appends and has a TODO around primary-key checking at the store transaction layer. + +--- + +## Law Solver + +The solver compiles IR laws into a restricted execution form. + +Supported compiled proposition forms: + +- atom +- equality +- conjunction + +Unsupported or incomplete forms include: + +- disjunction +- projected terms +- constructed terms +- tuple values + +Runtime law checking has two phases: + +1. Antecedent binding: scan relevant tables and produce variable bindings. +2. Consequent validation: check that required atoms and equalities hold for each binding. + +Law violations identify: + +- missing consequent atoms +- unsatisfied consequent equalities +- the law involved +- the binding that triggered the failure + +The solver is direct and full-store oriented. It is not incremental. + +--- + +## Persistence API + +The persistence API is under `persist::pst`. + +Main functions: + +- `encode_store(&Store) -> Result, PersisError>` +- `decode_store(&[u8]) -> Result` + +The persisted data includes: + +- format magic and version +- next table oid +- table metadata +- table schemas +- row ids +- column payloads +- source law entries + +On decode, Geomerge reconstructs tables and recompiles laws from the persisted law entries. + +--- + +## REPL API + +The binary starts the REPL. The REPL commands are: + +```text +/help +/exit +/quit +load-schema ; +load-store ; +list-schema; +add values (...), (...); +dump-table
; +dump-store; +persist ; +begin transact; name = add
values (...); ... commit; +``` + +REPL transaction bindings let one inserted row refer to another row inserted earlier in the same transaction: + +```text +begin transact; + g = add Graphs values (); + v = add G.V values (g); +commit; +``` + +For non-transactional `add`, entity values use `#id` syntax. Inside transaction blocks, entity columns can use either `#id` or a previous binding +name. + +--- + +## Key Features + +Implemented features: + +- compiled theory loading from JSON-compatible IR +- schema-defined table creation +- columnar row storage +- generated row ids +- entity references through row ids +- primitive int and string cells +- table path lookup +- validated row insertion +- batch add operations +- atomic batch commit after validation +- closure-based preview transactions +- owned transactions with original-store recovery on failure +- primary-key validation for batch adds and validated table appends +- law compilation for atoms, equality, and conjunction +- full-store law checking +- binary persistence and reload +- REPL schema loading, insertion, dumping, transactions, persistence, and reload + +Not implemented or limited: + +- deletes +- updates +- general query execution +- incremental validation +- native version-control semantics +- merge and conflict-resolution semantics +- concurrency control +- tuple cell insertion +- disjunction in compiled laws +- projected or constructed terms in compiled laws +- ad hoc indexes for efficient law checking + +--- + +## Practical Mental Model + +Geomerge currently behaves like: + +```text +compiled Geolog theory +-> typed table registry +-> validated row additions +-> full-store law checks +-> persisted store snapshots +``` + +The most relevant integration boundary for an incremental engine such as DBSP is law checking. +Today, Geomerge validates laws by scanning the current store. +A future incremental layer could maintain violation queries and let the store ask whether any violations exist after a proposed transaction. + +--- + +## Changelog + +* **May 13, 2026** -- First version of this document.