131 lines
5.8 KiB
Markdown
131 lines
5.8 KiB
Markdown
|
|
## Plan Runner
|
||
|
|
|
||
|
|
This crate is a snapshot executor for conjunctive-query plans.
|
||
|
|
It reads a JSON plan (a DAG of scan and join nodes plus the input facts),
|
||
|
|
walks the DAG using the operators from [`query-ops`](../query-ops),
|
||
|
|
and prints the binding relation produced at the root node.
|
||
|
|
|
||
|
|
The wire format mirrors `Geolog.DB.Plan.PlanGraph` from the
|
||
|
|
[`geolog`](../../external/geolog) submodule, but the JSON shape is the contract:
|
||
|
|
any frontend that emits this format can drive the runner.
|
||
|
|
The mapping from `PlanEvalAtom` / `PlanJoin` to `scan_atom` / `semijoin` / `natural_join`,
|
||
|
|
and the full IR spec, are documented as module-level rustdoc in
|
||
|
|
[`src/lib.rs`](src/lib.rs).
|
||
|
|
|
||
|
|
### Pipeline
|
||
|
|
|
||
|
|
End-to-end, scenarios become runner output through three stages:
|
||
|
|
|
||
|
|
```text
|
||
|
|
tools/exporter/examples/*.scenario.json
|
||
|
|
└── (Haskell exporter; runs Geolog.DB.Plan.planConjunction
|
||
|
|
and Geolog.DB.InMemory.evalConjunctionPlanned as a self-check)
|
||
|
|
└── crates/plan-runner/fixtures/*.json (JSON IR; checked in)
|
||
|
|
└── (plan-runner; this crate)
|
||
|
|
└── stdout JSON, with row-for-row oracle check
|
||
|
|
```
|
||
|
|
|
||
|
|
The exporter (`tools/exporter`) is the only producer of runner IR today;
|
||
|
|
it's where atoms are planned and rejected if they don't fit the supported subset.
|
||
|
|
Fixtures are regenerated with `make export-fixtures`, and the full loop is `make examples`.
|
||
|
|
|
||
|
|
What happens inside the runner once a JSON plan arrives:
|
||
|
|
|
||
|
|
<div align="center">
|
||
|
|
<picture>
|
||
|
|
<img alt="Workflow" src="docs/diagrams/workflow.svg" height="90%" width="90%">
|
||
|
|
</picture>
|
||
|
|
</div>
|
||
|
|
|
||
|
|
### Backends
|
||
|
|
|
||
|
|
The CLI takes a `--backend` flag.
|
||
|
|
The `memory` backend is the pure in-memory path;
|
||
|
|
every other backend routes facts through the [`Storage`](../storage) trait
|
||
|
|
via `build_tables_via_storage`, then scans tables back out before executing.
|
||
|
|
|
||
|
|
| Backend | Storage | Location |
|
||
|
|
|------------------|------------------------------------------------|-----------------------|
|
||
|
|
| `memory` | none (direct from `plan.facts`) | n/a |
|
||
|
|
| `memory-storage` | `MemoryStorage` | in-process |
|
||
|
|
| `lmdb` | `LmdbStorage` (heed-backed mmap B-tree) | fresh tempdir per run |
|
||
|
|
| `redb` | `RedbStorage` (single-file B-tree) | fresh tempdir per run |
|
||
|
|
| `fjall` | `FjallStorage` (LSM tree) | fresh tempdir per run |
|
||
|
|
| `sqlite` | `SqliteStorage` (rusqlite, bundled libsqlite3) | fresh tempdir per run |
|
||
|
|
| `geomerge` | `GeomergeStorage` (CRDT; alpha) | in-process |
|
||
|
|
|
||
|
|
All seven produce byte-identical output for every checked-in fixture.
|
||
|
|
The point of the abstraction is not performance comparison
|
||
|
|
(the snapshot evaluator is bulk-materialized either way),
|
||
|
|
but to validate that the storage layer is genuinely backend-neutral
|
||
|
|
and that adding a new adapter is a constructor swap.
|
||
|
|
|
||
|
|
Note on `geomerge`:
|
||
|
|
the runner's JSON IR is untyped (only arity per relation),
|
||
|
|
but geomerge requires a typed theory upfront.
|
||
|
|
The CLI infers column types from the first fact row per relation
|
||
|
|
and synthesizes a theory of `PrimInt` and `PrimString` columns via
|
||
|
|
[`GeomergeStorage::with_relations`](../storage/src/adapters/geomerge.rs).
|
||
|
|
Columns with no sample facts default to `PrimString`.
|
||
|
|
|
||
|
|
### Run It
|
||
|
|
|
||
|
|
```sh
|
||
|
|
# Run one fixture through the default in-memory path:
|
||
|
|
cargo run -p plan-runner -- crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
|
||
|
|
# Same plan, routed through different backends:
|
||
|
|
cargo run -p plan-runner -- --backend memory-storage crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
cargo run -p plan-runner -- --backend lmdb crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
cargo run -p plan-runner -- --backend redb crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
cargo run -p plan-runner -- --backend fjall crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
cargo run -p plan-runner -- --backend sqlite crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
cargo run -p plan-runner -- --backend geomerge crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
|
||
|
|
# Regenerate every fixture from its scenario and run the oracle test:
|
||
|
|
make examples
|
||
|
|
```
|
||
|
|
|
||
|
|
A sample run:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
$ plan-run crates/plan-runner/fixtures/two_atom_join.json
|
||
|
|
{"columns":["a","b","_w0_2"],"rows":[["node:1","node:2","edge:1"],["node:2","node:1","edge:2"]]}
|
||
|
|
```
|
||
|
|
|
||
|
|
The `_w<atomIdx>_<pos>` columns are wildcards the exporter named so the runner can bind them.
|
||
|
|
The scenario's `expected_bindings` block names only the variables the test cares about,
|
||
|
|
and `verify` projects the runner output to that subset before comparing as a multiset.
|
||
|
|
|
||
|
|
### Run the Tests
|
||
|
|
|
||
|
|
```sh
|
||
|
|
cargo test -p plan-runner
|
||
|
|
```
|
||
|
|
|
||
|
|
The two integration test files exercise complementary properties:
|
||
|
|
|
||
|
|
- `tests/examples.rs` walks every fixture and checks it against its `expected_bindings` oracle.
|
||
|
|
- `tests/storage_roundtrip.rs` cross-checks the pure path against the storage-backed path,
|
||
|
|
to keep `build_tables` and `build_tables_via_storage` in lockstep.
|
||
|
|
|
||
|
|
### Notes
|
||
|
|
|
||
|
|
- **IR contract.**
|
||
|
|
The runner is backend-agnostic and frontend-agnostic:
|
||
|
|
it consumes JSON in the shape documented in `src/lib.rs` and produces a binding relation.
|
||
|
|
Anything that emits the same JSON can drive it.
|
||
|
|
- **No optimizer.**
|
||
|
|
Plans are executed as written.
|
||
|
|
Node ordering, join shape, and antijoin scheduling are all the producer's responsibility.
|
||
|
|
This crate's job ends at faithful execution of the IR.
|
||
|
|
- **Wildcard columns survive.**
|
||
|
|
`scan_atom` keeps every distinct variable that appears in the pattern,
|
||
|
|
including the exporter's synthetic `_w<atomIdx>_<pos>` names.
|
||
|
|
The runner does not project them out;
|
||
|
|
oracle verification handles that on the comparison side.
|
||
|
|
- **Bulk, not streaming.**
|
||
|
|
Each node materializes its full output as a `Relation`.
|
||
|
|
This matches `query-ops`' execution model;
|
||
|
|
it's not designed for incremental or maintained-view workloads.
|