Hassan Abedi 6560c2696f WIP
2026-06-05 13:17:53 +02:00

5.8 KiB

Plan Runner

This crate is a snapshot executor for conjunctive-query plans. It reads a JSON plan (a DAG of scan and join nodes plus the input facts), walks the DAG using the operators from query-ops, and prints the binding relation produced at the root node.

The wire format mirrors Geolog.DB.Plan.PlanGraph from the geolog submodule, but the JSON shape is the contract: any frontend that emits this format can drive the runner. The mapping from PlanEvalAtom / PlanJoin to scan_atom / semijoin / natural_join, and the full IR spec, are documented as module-level rustdoc in src/lib.rs.

Pipeline

End-to-end, scenarios become runner output through three stages:

tools/exporter/examples/*.scenario.json
  └── (Haskell exporter; runs Geolog.DB.Plan.planConjunction
       and Geolog.DB.InMemory.evalConjunctionPlanned as a self-check)
        └── crates/plan-runner/fixtures/*.json    (JSON IR; checked in)
             └── (plan-runner; this crate)
                  └── stdout JSON, with row-for-row oracle check

The exporter (tools/exporter) is the only producer of runner IR today; it's where atoms are planned and rejected if they don't fit the supported subset. Fixtures are regenerated with make export-fixtures, and the full loop is make examples.

What happens inside the runner once a JSON plan arrives:

Workflow

Backends

The CLI takes a --backend flag. The memory backend is the pure in-memory path; every other backend routes facts through the Storage trait via build_tables_via_storage, then scans tables back out before executing.

Backend Storage Location
memory none (direct from plan.facts) n/a
memory-storage MemoryStorage in-process
lmdb LmdbStorage (heed-backed mmap B-tree) fresh tempdir per run
redb RedbStorage (single-file B-tree) fresh tempdir per run
fjall FjallStorage (LSM tree) fresh tempdir per run
sqlite SqliteStorage (rusqlite, bundled libsqlite3) fresh tempdir per run
geomerge GeomergeStorage (CRDT; alpha) in-process

All seven produce byte-identical output for every checked-in fixture. The point of the abstraction is not performance comparison (the snapshot evaluator is bulk-materialized either way), but to validate that the storage layer is genuinely backend-neutral and that adding a new adapter is a constructor swap.

Note on geomerge: the runner's JSON IR is untyped (only arity per relation), but geomerge requires a typed theory upfront. The CLI infers column types from the first fact row per relation and synthesizes a theory of PrimInt and PrimString columns via GeomergeStorage::with_relations. Columns with no sample facts default to PrimString.

Run It

# Run one fixture through the default in-memory path:
cargo run -p plan-runner -- crates/plan-runner/fixtures/two_atom_join.json

# Same plan, routed through different backends:
cargo run -p plan-runner -- --backend memory-storage crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend lmdb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend redb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend fjall          crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend sqlite         crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend geomerge       crates/plan-runner/fixtures/two_atom_join.json

# Regenerate every fixture from its scenario and run the oracle test:
make examples

A sample run:

$ plan-run crates/plan-runner/fixtures/two_atom_join.json
{"columns":["a","b","_w0_2"],"rows":[["node:1","node:2","edge:1"],["node:2","node:1","edge:2"]]}

The _w<atomIdx>_<pos> columns are wildcards the exporter named so the runner can bind them. The scenario's expected_bindings block names only the variables the test cares about, and verify projects the runner output to that subset before comparing as a multiset.

Run the Tests

cargo test -p plan-runner

The two integration test files exercise complementary properties:

  • tests/examples.rs walks every fixture and checks it against its expected_bindings oracle.
  • tests/storage_roundtrip.rs cross-checks the pure path against the storage-backed path, to keep build_tables and build_tables_via_storage in lockstep.

Notes

  • IR contract. The runner is backend-agnostic and frontend-agnostic: it consumes JSON in the shape documented in src/lib.rs and produces a binding relation. Anything that emits the same JSON can drive it.
  • No optimizer. Plans are executed as written. Node ordering, join shape, and antijoin scheduling are all the producer's responsibility. This crate's job ends at faithful execution of the IR.
  • Wildcard columns survive. scan_atom keeps every distinct variable that appears in the pattern, including the exporter's synthetic _w<atomIdx>_<pos> names. The runner does not project them out; oracle verification handles that on the comparison side.
  • Bulk, not streaming. Each node materializes its full output as a Relation. This matches query-ops' execution model; it's not designed for incremental or maintained-view workloads.