storage-engine-playground

habedi-work/storage-engine-playground

Fork 0

History

Hassan Abedi fa9f2ce50e WIP

2026-06-05 13:01:18 +02:00

fixtures

WIP

2026-06-05 13:01:18 +02:00

src

WIP

2026-06-05 13:01:18 +02:00

tests

WIP

2026-06-05 13:01:18 +02:00

Cargo.toml

WIP

2026-06-05 13:01:18 +02:00

README.md

WIP

2026-06-05 13:01:18 +02:00

README.md

Plan Runner

This crate is a snapshot executor for conjunctive-query plans. It reads a JSON plan (a DAG of scan and join nodes plus the input facts), walks the DAG using the operators from query-ops, and prints the binding relation produced at the root node.

The wire format mirrors Geolog.DB.Plan.PlanGraph from the geolog submodule, but the JSON shape is the contract: any frontend that emits this format can drive the runner. The mapping from PlanEvalAtom / PlanJoin to scan_atom / semijoin / natural_join, and the full IR spec, are documented as module-level rustdoc in src/lib.rs.

Pipeline

End-to-end, scenarios become runner output through three stages:

tools/exporter/examples/*.scenario.json
  └── (Haskell exporter; runs Geolog.DB.Plan.planConjunction
       and Geolog.DB.InMemory.evalConjunctionPlanned as a self-check)
        └── crates/plan-runner/fixtures/*.json    (JSON IR; checked in)
             └── (plan-runner; this crate)
                  └── stdout JSON, with row-for-row oracle check

The exporter (tools/exporter) is the only producer of runner IR today; it's where atoms are planned and rejected if they don't fit the supported subset. Fixtures are regenerated with make export-fixtures, and the full loop is make examples.

Backends

The CLI takes a --backend flag. The memory backend is the pure in-memory path; every other backend routes facts through the Storage trait via build_tables_via_storage, then scans tables back out before executing.

Backend	Storage	Location
`memory`	none (direct from `plan.facts`)	n/a
`memory-storage`	`MemoryStorage`	in-process
`lmdb`	`LmdbStorage` (heed-backed mmap B-tree)	fresh tempdir per run
`redb`	`RedbStorage` (single-file B-tree)	fresh tempdir per run
`fjall`	`FjallStorage` (LSM tree)	fresh tempdir per run
`sqlite`	`SqliteStorage` (rusqlite, bundled libsqlite3)	fresh tempdir per run
`geomerge`	`GeomergeStorage` (CRDT; alpha)	in-process

All seven produce byte-identical output for every checked-in fixture. The point of the abstraction is not performance comparison (the snapshot evaluator is bulk-materialized either way), but to validate that the storage layer is genuinely backend-neutral and that adding a new adapter is a constructor swap.

Note on geomerge: the runner's JSON IR is untyped (only arity per relation), but geomerge requires a typed theory upfront. The CLI infers column types from the first fact row per relation and synthesizes a theory of PrimInt and PrimString columns via GeomergeStorage::with_relations. Columns with no sample facts default to PrimString.

Run It

# Run one fixture through the default in-memory path:
cargo run -p plan-runner -- crates/plan-runner/fixtures/two_atom_join.json

# Same plan, routed through different backends:
cargo run -p plan-runner -- --backend memory-storage crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend lmdb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend redb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend fjall          crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend sqlite         crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend geomerge       crates/plan-runner/fixtures/two_atom_join.json

# Regenerate every fixture from its scenario and run the oracle test:
make examples

A sample run:

$ plan-run crates/plan-runner/fixtures/two_atom_join.json
{"columns":["a","b","_w0_2"],"rows":[["node:1","node:2","edge:1"],["node:2","node:1","edge:2"]]}

The _w<atomIdx>_<pos> columns are wildcards the exporter named so the runner can bind them. The scenario's expected_bindings block names only the variables the test cares about, and verify projects the runner output to that subset before comparing as a multiset.

Run the Tests

cargo test -p plan-runner

The two integration test files exercise complementary properties:

tests/examples.rs walks every fixture and checks it against its expected_bindings oracle.
tests/storage_roundtrip.rs cross-checks the pure path against the storage-backed path, to keep build_tables and build_tables_via_storage in lockstep.

Notes

IR contract. The runner is backend-agnostic and frontend-agnostic: it consumes JSON in the shape documented in src/lib.rs and produces a binding relation. Anything that emits the same JSON can drive it.
No optimizer. Plans are executed as written. Node ordering, join shape, and antijoin scheduling are all the producer's responsibility. This crate's job ends at faithful execution of the IR.
Wildcard columns survive. scan_atom keeps every distinct variable that appears in the pattern, including the exporter's synthetic _w<atomIdx>_<pos> names. The runner does not project them out; oracle verification handles that on the comparison side.
Bulk, not streaming. Each node materializes its full output as a Relation. This matches query-ops' execution model; it's not designed for incremental or maintained-view workloads.