useful-notes/storage/001-geomerge-public-api-and-features.md
Hassan Abedi d6fec698fd WIP
2026-05-13 12:54:12 +02:00

17 KiB

Geomerge Public API and Features

A reading note on Geomerge's current library API, REPL surface, and implemented storage behavior.


Short Answer

Geomerge is currently a Rust storage-engine prototype for compiled Geolog theories.

The implemented core is:

compiled FlatTheory
-> Store with empty tables and compiled laws
-> Add operations or transactions
-> schema validation
-> law validation
-> committed table state

It is not yet a full database engine. Native version control, large-scale concurrency, conflict resolution, general query execution, deletes, and updates are not implemented in the current API.


Terminology

Geomerge: A storage-engine prototype for compiled Geolog theories. It loads a compiled Geolog theory, creates tables, accepts row additions, checks laws, and can persist and reload store state.

Geolog: The larger database and logic system that Geomerge is intended to support. Geomerge consumes compiled Geolog IR data.

Compiled Theory: A ready-to-load description of tables and laws. In code, this is FlatTheory.

FlatTheory: The top-level IR object Geomerge consumes. It contains a list of table definitions and a list of law definitions.

IR: Intermediate representation. It is the structured data format shared by geolog-lang and geomerge, rather than the source syntax a user might write by hand.

Path: A stable name for a table or law, such as Graphs, G.V, or Hom.E.foreignKeys. Geomerge uses paths in public operations because paths are more portable than internal table ids.

Schema: The column layout of a table. It says how many columns a row has, what type each column expects, and whether the table has a primary key.

Table: A stored relation. Geomerge tables are columnar internally, but users usually think of them as collections of rows.

RowId: A generated identifier for a row. Entity references are represented with row ids.

CellValue: A value stored in one table cell. The current supported values are entity ids, integers, and strings.

Op: A store mutation operation. Currently the only operation is Op::Add, which inserts one row into one table.

Batch: A group of operations submitted together with Store::apply_batch. The whole batch commits only if schema and law validation succeed.

Transaction: A preview-store mutation that commits only after validation. Transactions are useful when later inserted rows need row ids from earlier inserted rows.

Law: A constraint from the compiled theory. Geomerge checks laws after proposed changes and rejects changes that make a law fail.

Binding: A set of variable assignments found while matching a law antecedent against current table rows.

Violation: A failed law check. Current violations are either missing consequent atoms or unsatisfied equalities.

Persistence: Encoding a store to bytes and decoding it later. Geomerge persists table data, schemas, row ids, and source law entries.


Crate Layout

The workspace has two main crates:

  • geolog-lang: shared Geolog IR definitions
  • geomerge: storage engine, table layer, validation, persistence, and REPL

The geomerge crate exports:

  • ir: re-exported Geolog IR types from geolog-lang
  • ops: mutation operation types
  • persist: binary store persistence
  • repl: command parsing and REPL execution helpers
  • solver: law compilation, binding, matching, and validation
  • store: the main Store API
  • table: table storage, cell values, row ids, and validation errors
  • transaction: owned transaction wrapper

The most important public modules for library users are ir, store, table, ops, and persist.


Architecture Diagram

flowchart TD
    A[Compiled FlatTheory JSON] --> B[geolog-lang IR]
    B --> C[Store::try_from_theory]

    C --> D[Table Registry]
    C --> E[Law Compiler]

    D --> F[Table]
    F --> G[Columnar Cell Storage]
    F --> H[Generated Row IDs]
    F --> I[Schema and Primary-Key Validation]

    E --> J[Compiled Laws]
    J --> K[Law Validator]

    L[Op::Add Batch] --> M[Store::apply_batch]
    N[Closure Transaction] --> O[Store::transact]
    P[Owned Transaction] --> Q[OwnedTransaction::commit]

    M --> R[Preview Store Clone]
    O --> R
    Q --> R

    R --> I
    R --> K
    K --> S{Validation Result}
    I --> S

    S -->|ok| T[Committed Store]
    S -->|error| U[Original Store Unchanged]

    T --> V[Store and Table Accessors]
    T --> W[persist::pst::encode_store]
    W --> X[Store Bytes]
    X --> Y[persist::pst::decode_store]
    Y --> T

    Z[REPL Commands] --> L
    Z --> N
    Z --> W

The key architectural point is the preview-store boundary. Mutations are applied to a clone, schema and law checks run on that clone, and the original store is replaced only after validation succeeds.


IR API

The IR crate defines the data shape Geomerge consumes.

Important types:

  • FlatTheory: a compiled theory containing table entries and law entries
  • TableEntry: a table path plus schema
  • LawEntry: a law path plus law definition
  • Schema: column types and an optional primary key
  • ColType: entity, primitive, or tuple column type
  • PrimType: int or string
  • Path: dotted table and law paths such as Graphs, G.V, or Hom.E.foreignKeys
  • Law: variables, antecedent proposition, and consequent proposition
  • Prop: atom, equality, conjunction, or disjunction at the IR level
  • Term: literal, variable, projection, or constructor at the IR level

The IR is broader than the current store and solver implementation. For example, tuple columns, disjunction, projections, and constructors exist in the IR, but they are not fully supported by the current storage and validation path.


Store API

Store is the central API.

The usual construction path is:

let theory: geomerge::ir::FlatTheory = serde_json::from_str(input)?;
let mut store = geomerge::store::Store::try_from_theory(theory)?;

Store::try_from_theory builds one empty table for each table in the theory and compiles the theory laws into solver form.

Main accessors:

  • Store::new: empty store construction
  • Store::try_from_theory: store construction from FlatTheory
  • Store::tables: table iterator
  • Store::table_count: table count
  • Store::resolve_table: table path to table oid lookup
  • Store::table: table lookup by oid
  • Store::table_mut: mutable table lookup by oid
  • Store::table_at: table lookup by path
  • Store::table_at_mut: mutable table lookup by path
  • Store::laws: compiled law access
  • Store::dump: debug rendering of all tables
  • Store::check_laws: full law validation over current store contents

Store::insert_table is public, but schema-driven table creation through Store::try_from_theory is the intended normal path.


Public API Examples

These examples show the API shape. A concrete theory can require related rows to be inserted together so that laws hold at commit time.

Loading a Theory

use geomerge::{
    ir::FlatTheory,
    store::Store,
};

fn load_theory(input: &str) -> Result<Store, Box<dyn std::error::Error>> {
    let theory: FlatTheory = serde_json::from_str(input)?;
    let store = Store::try_from_theory(theory)?;
    Ok(store)
}

This is the normal entry point for library use. The theory supplies table schemas and laws; the resulting store starts with empty tables.

Adding Rows with apply_batch

use geomerge::{
    ir::Path,
    store::Store,
    table::CellValue,
};

fn add_graph(store: &mut Store) -> Result<u64, Box<geomerge::store::StoreIntError>> {
    let graphs = store
        .table_at(&Path::from("Graphs"))
        .expect("Graphs table exists");

    let op = graphs.add(vec![]);
    let row_ids = store.apply_batch(vec![op])?;

    Ok(row_ids[0])
}

fn add_vertex(
    store: &mut Store,
    graph_id: u64,
) -> Result<u64, Box<geomerge::store::StoreIntError>> {
    let vertices = store
        .table_at(&Path::from("G.V"))
        .expect("G.V table exists");

    let op = vertices.add(vec![CellValue::Id(graph_id)]);
    let row_ids = store.apply_batch(vec![op])?;

    Ok(row_ids[0])
}

Table::add constructs an operation. Store::apply_batch is the call that validates and commits it.

use geomerge::{
    ir::Path,
    store::{Store, StoreIntError},
    table::CellValue,
};

fn add_graph_with_vertices(store: &mut Store) -> Result<(u64, u64, u64), Box<StoreIntError>> {
    store.transact(|preview| {
        let graph_id = preview
            .table_at_mut(&Path::from("Graphs"))
            .expect("Graphs table exists")
            .append_row_validated(vec![])?;

        let v0 = preview
            .table_at_mut(&Path::from("G.V"))
            .expect("G.V table exists")
            .append_row_validated(vec![CellValue::Id(graph_id)])?;

        let v1 = preview
            .table_at_mut(&Path::from("G.V"))
            .expect("G.V table exists")
            .append_row_validated(vec![CellValue::Id(graph_id)])?;

        Ok((graph_id, v0, v1))
    })
}

This style is useful when later rows need row ids produced earlier in the same transaction.

Reading Table Contents

use geomerge::{
    ir::Path,
    store::Store,
};

fn print_table_cells(store: &Store) {
    let table = store
        .table_at(&Path::from("G.V"))
        .expect("G.V table exists");

    for row_idx in 0..table.row_count() {
        let row_id = table.row_id_at(row_idx).expect("row id exists");
        let graph = table.cell_at(row_idx, 0).expect("graph column exists");
        println!("row #{row_id}: graph={graph}");
    }
}

The table API exposes physical row indexes, generated row ids, and per-column cell access. It does not expose a general query language.

Explicit Law Checking

use geomerge::store::{Store, StoreIntError};

fn validate_store(store: &Store) -> Result<(), Box<StoreIntError>> {
    store.check_laws()
}

Store::apply_batch and Store::transact already call check_laws before committing. Calling it directly is useful after manual store construction or debugging.

Persistence Round Trip

use geomerge::{
    persist::pst::{decode_store, encode_store},
    store::Store,
};

fn round_trip(store: &Store) -> Result<Store, geomerge::persist::error::PersisError> {
    let bytes = encode_store(store)?;
    decode_store(&bytes)
}

Decoded stores reconstruct table state and recompile laws from the persisted law entries.


Mutation API

The mutation operation type currently has one variant:

Op::Add {
    table: ir::Path,
    values: Vec<CellValue>,
}

The table path is used instead of an internal table id, which keeps operations stable across stores.

The most direct insertion flow is:

let table = store.table_at(&geomerge::ir::Path::from("G.V")).unwrap();
let op = table.add(vec![geomerge::table::CellValue::Id(0)]);
let row_ids = store.apply_batch(vec![op])?;

Store::apply_batch validates the entire batch first. It then applies operations to a cloned preview store, checks all laws, and commits the preview only if validation succeeds. On validation failure, the original store is unchanged.

The returned Vec<RowId> is in the same order as the input operations.


Table API

Tables are columnar stores. A table owns:

  • its Path
  • its Schema
  • generated row ids
  • one column vector per schema column

Rows are identified by generated RowId values. Row ids are managed by the store layer and are exposed as entity references.

Main table methods:

  • Table::new: table construction from path and schema
  • Table::schema: schema access
  • Table::path: path access
  • Table::row_count: row count
  • Table::row_id_at: row id at a physical row index
  • Table::cell_at: cell lookup by physical row index and column index
  • Table::dump: debug rendering
  • Table::validate: schema validation for a candidate row
  • Table::primary_key_values: primary-key extraction from candidate values
  • Table::validate_new_row: schema and primary-key validation
  • Table::append_row_validated: direct validated append
  • Table::add: operation construction for Store::apply_batch

Library code should prefer Store::apply_batch or Store::transact for durable changes, because those paths also check laws.


Cell Values and Validation

The supported cell values are:

CellValue::Id(RowId)
CellValue::Int(i64)
CellValue::Str(String)

Validation checks:

  • column count
  • entity column values as CellValue::Id
  • integer columns as CellValue::Int
  • string columns as CellValue::Str
  • tuple column rejection
  • primary-key duplication
  • unknown table paths in batch application

Primary-key behavior:

  • None: no primary-key uniqueness check
  • Some(columns): uniqueness over listed columns
  • Some([]): singleton table, meaning at most one row

Transactions

Geomerge has two transaction styles.

Store::transact takes a closure over a preview store:

store.transact(|preview| {
    let table = preview.table_at_mut(&Path::from("Graphs")).unwrap();
    let row_id = table.append_row_validated(vec![])?;
    Ok(row_id)
})?;

The closure can perform multiple direct table appends. After the closure succeeds, Geomerge checks laws on the preview and commits only if they hold.

Store::into_transaction consumes a store and returns an OwnedTransaction. Its commit method returns either:

Ok((output, committed_store))
Err((error, original_store))

That shape is useful when the caller needs ownership of the original store after a failed transaction.

One implementation detail to watch: Store::apply_batch checks primary-key conflicts within the whole batch before applying. The closure-based transaction path relies on table-level validated appends and has a TODO around primary-key checking at the store transaction layer.


Law Solver

The solver compiles IR laws into a restricted execution form.

Supported compiled proposition forms:

  • atom
  • equality
  • conjunction

Unsupported or incomplete forms include:

  • disjunction
  • projected terms
  • constructed terms
  • tuple values

Runtime law checking has two phases:

  1. Antecedent binding: scan relevant tables and produce variable bindings.
  2. Consequent validation: check that required atoms and equalities hold for each binding.

Law violations identify:

  • missing consequent atoms
  • unsatisfied consequent equalities
  • the law involved
  • the binding that triggered the failure

The solver is direct and full-store oriented. It is not incremental.


Persistence API

The persistence API is under persist::pst.

Main functions:

  • encode_store(&Store) -> Result<Vec<u8>, PersisError>
  • decode_store(&[u8]) -> Result<Store, PersisError>

The persisted data includes:

  • format magic and version
  • next table oid
  • table metadata
  • table schemas
  • row ids
  • column payloads
  • source law entries

On decode, Geomerge reconstructs tables and recompiles laws from the persisted law entries.


REPL API

The binary starts the REPL. The REPL commands are:

/help
/exit
/quit
load-schema <path>;
load-store <path>;
list-schema;
add <table> values (...), (...);
dump-table <table>;
dump-store;
persist <path>;
begin transact; name = add <table> values (...); ... commit;

REPL transaction bindings let one inserted row refer to another row inserted earlier in the same transaction:

begin transact;
  g = add Graphs values ();
  v = add G.V values (g);
commit;

For non-transactional add, entity values use #id syntax. Inside transaction blocks, entity columns can use either #id or a previous binding name.


Key Features

Implemented features:

  • compiled theory loading from JSON-compatible IR
  • schema-defined table creation
  • columnar row storage
  • generated row ids
  • entity references through row ids
  • primitive int and string cells
  • table path lookup
  • validated row insertion
  • batch add operations
  • atomic batch commit after validation
  • closure-based preview transactions
  • owned transactions with original-store recovery on failure
  • primary-key validation for batch adds and validated table appends
  • law compilation for atoms, equality, and conjunction
  • full-store law checking
  • binary persistence and reload
  • REPL schema loading, insertion, dumping, transactions, persistence, and reload

Not implemented or limited:

  • deletes
  • updates
  • general query execution
  • incremental validation
  • native version-control semantics
  • merge and conflict-resolution semantics
  • concurrency control
  • tuple cell insertion
  • disjunction in compiled laws
  • projected or constructed terms in compiled laws
  • ad hoc indexes for efficient law checking

Practical Mental Model

Geomerge currently behaves like:

compiled Geolog theory
-> typed table registry
-> validated row additions
-> full-store law checks
-> persisted store snapshots

The most relevant integration boundary for an incremental engine such as DBSP is law checking. Today, Geomerge validates laws by scanning the current store. A future incremental layer could maintain violation queries and let the store ask whether any violations exist after a proposed transaction.


Changelog

  • May 13, 2026 -- First version of this document.