storage-engine-playground/crates/plan-runner/README.md

## Plan Runner

This crate implements an executor for (conjunctive) query plans.
The implementation is a CLI tool.
It reads a JSON plan (which currently is a DAG of scan and join nodes plus the input facts),
walks the DAG using the operators from [`query-ops`](../query-ops),
and prints the resulting relation as JSON to stdout.

### Pipeline

End-to-end, scenarios become runner output through three stages:

```text
tools/exporter/examples/*.scenario.json
  └── (Haskell exporter; runs Geolog.DB.Plan.planConjunction
       and Geolog.DB.InMemory.evalConjunctionPlanned as a self-check)
        └── crates/plan-runner/fixtures/*.json    (JSON IR; checked in)
             └── (plan-runner; this crate)
                  └── stdout JSON, with row-for-row oracle check
```

The exporter (`tools/exporter`) is the only producer of runner IR today;
it's where atoms are planned and rejected if they don't fit the supported subset.
Fixtures are regenerated with `make export-fixtures`, and the full loop is `make examples`.

What happens inside the runner once a JSON plan arrives:

<div align="center">
  <picture>
    <img alt="Workflow" src="docs/diagrams/workflow.svg" height="90%" width="90%">
  </picture>
</div>

### Storage Backends

The CLI takes a `--backend` flag.
The `memory` backend is the pure in-memory path;
every other backend routes facts through the [`Storage`](../storage) trait
via `build_tables_via_storage`, then scans tables back out before executing.

| Backend          | Storage           | Location              |
|------------------|-------------------|-----------------------|
| `memory`         | none              | n/a                   |
| `memory-storage` | `MemoryStorage`   | in-process            |
| `lmdb`           | `LmdbStorage`     | fresh tempdir per run |
| `redb`           | `RedbStorage`     | fresh tempdir per run |
| `fjall`          | `FjallStorage`    | fresh tempdir per run |
| `sqlite`         | `SqliteStorage`   | fresh tempdir per run |
| `geomerge`       | `GeomergeStorage` | in-process            |

### Execute a Query Plan

```sh
# Run a plan with the default backend (no storage)
cargo run -p plan-runner -- crates/plan-runner/fixtures/two_atom_join.json

# Run the same plan with every supported backend
cargo run -p plan-runner -- --backend memory-storage crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend lmdb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend redb           crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend fjall          crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend sqlite         crates/plan-runner/fixtures/two_atom_join.json
cargo run -p plan-runner -- --backend geomerge       crates/plan-runner/fixtures/two_atom_join.json
```

A sample run:

```sh
$ plan-run crates/plan-runner/fixtures/two_atom_join.json
{"columns":["a","b","_w0_2"],"rows":[["node:1","node:2","edge:1"],["node:2","node:1","edge:2"]]}
```

The `_w<atomIdx>_<pos>` columns are wildcards the exporter named so the runner can bind them.
The scenario's `expected_bindings` block names only the variables the test cares about,
and `verify` projects the runner output to that subset before comparing as a multiset.

### Run the Tests

```sh
cargo test -p plan-runner
```

### Notes

- **IR contract.**
  The runner is backend-agnostic and frontend-agnostic.
  It consumes JSON in the shape documented in `src/lib.rs` and produces a binding relation.
  Anything that emits the same JSON can drive it.
- **No optimizer.**
  Plans are executed as written.
  Node ordering, join shape, and antijoin scheduling are all the producer's responsibility.
  This crate's job ends at faithful execution of the IR.
- **Wildcard columns survive.**
  `scan_atom` keeps every distinct variable that appears in the pattern,
  including the exporter's synthetic `_w<atomIdx>_<pos>` names.
  The runner does not project them out;
  oracle verification handles that on the comparison side.
- **Bulk, not streaming.**
  Each node materializes its full output as a `Relation`.
  This matches `query-ops`' execution model;
  it's not designed for incremental or maintained-view workloads.