From b38b176e7f33a1660dd204a7441b73e39ee231b0 Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Fri, 5 Jun 2026 13:51:26 +0200 Subject: [PATCH] Add a few emojies to README files --- AGENTS.md | 108 +++++++++++------------------------ README.md | 6 +- crates/plan-runner/README.md | 5 ++ crates/storage/README.md | 2 +- tools/exporter/README.md | 7 ++- 5 files changed, 47 insertions(+), 81 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 26b91bf..ea29a6c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,15 +4,15 @@ This file provides guidance to coding agents collaborating on this repository. ## Mission -`storage-engine-playground` is an experimental Rust project for testing ideas from the FlowLog, DBSP, CRDT-as-query, and Geomerge notes. +`storage-engine-playground` is an experimental Rust project for prototyping query engines and storage engines. The goal is not production software. The goal is a clear, runnable playground for small prototypes that help answer concrete architecture questions: -- how Datalog-like rules should be parsed, cataloged, planned, and optimized -- how FlowLog-style planning ideas transfer to a DBSP-oriented frontend -- how CRDT queries behave under naive plans versus planned relational execution -- how Geomerge-style laws can compile into maintained violation relations -- how backend behavior changes across snapshot, DBSP-like, and Differential Dataflow-like execution models +- how a query language should be parsed, cataloged, planned, and optimized +- how a query planner and a query executor should be separated, and what intermediate representation sits between them +- how a query executor's operators (scans, joins, antijoins, projections) compose into a working snapshot evaluator +- how a storage engine should expose a backend-neutral interface (relations, rows, transactions, scans), and how that interface holds up across + different backends (in-process, file-backed, CRDT, and so on) Priorities, in order: @@ -44,19 +44,23 @@ Priorities, in order: ## Repository Layout -The repository is new and may change. Discover the current layout from the filesystem before editing. +Discover the current layout from the filesystem before editing. +The shape today is: -Expected durable areas may include: +- `crates/`: Rust workspace. + See [`crates/README.md`](crates/README.md) for the responsibilities and dependency edges between the four crates (`storage`, `query-ops`, + `plan-runner`, `geomerge-demo`). + Each crate keeps its own `src/`, `tests/`, and (where relevant) `fixtures/`, `benches/`, and `docs/diagrams/` subdirectories. +- `tools/exporter/`: Haskell tool that consumes hand-authored `.scenario.json` files in `tools/exporter/examples/` and emits the runner-IR JSON + consumed by `crates/plan-runner`. + See [`tools/exporter/README.md`](tools/exporter/README.md). +- `external/`: git submodules. + `external/geolog` provides the Haskell query planner used by the exporter; `external/geomerge` is the Rust CRDT crate consumed by + `storage::adapters::geomerge`. +- Top-level configuration: `Makefile`, `flake.nix`, `Cargo.toml` (workspace), `pyproject.toml`, `.pre-commit-config.yaml`, `rust-toolchain.toml`. -- `src/`: Rust source for parser, catalog, planner, execution experiments, and storage prototypes. -- `tests/`: integration tests for rule planning, evaluation, and storage behavior. -- `tools/exporter/examples/`: hand-authored scenario JSON consumed by the Haskell exporter to produce runner fixtures. -- `fixtures/`: committed input facts and expected outputs. -- `notes/`: local design notes that belong to this project. -- `flowlog/`: project-local notes or sketches derived from the FlowLog line of work. - -Do not assume this list is exhaustive. If the project grows a different structure, follow the actual codebase and update this file when conventions -stabilize. +Do not assume this list is exhaustive. +If the project grows a different structure, follow the actual codebase and update this file when conventions stabilize. ## Technical Direction @@ -70,15 +74,15 @@ Datalog-like rules or Geolog-shaped laws -> relational plan -> FlowLog-style optimization -> backend lowering --> maintained or snapshot outputs +-> snapshot outputs ``` Keep these layers explicit: -- **Source Layer**: Datalog-like test programs, CRDT query definitions, and Geomerge-style laws. +- **Source Layer**: Datalog-like test programs and Geomerge-style laws. - **Catalog Layer**: rule heads, body atoms, variables, constants, comparisons, negation, and projections. - **Planning Layer**: join graphs, join order, antijoin placement, SIP-style filtering, subplan sharing, and physical key choice. -- **Execution Layer**: snapshot evaluator first, then DBSP-like or Differential Dataflow-like experiments. +- **Execution Layer**: snapshot evaluator. - **Storage Layer**: facts, transactions, rollback, preview state, and violation output integration. ## FlowLog-Inspired Planning @@ -106,60 +110,6 @@ rule with three positive atoms -> expected textual plan ``` -## DBSP and Incremental Execution - -DBSP-related work should preserve a clean boundary: - -```text -planned relational IR --> DBSP lowering --> maintained output deltas -``` - -Do not make DBSP responsible for source-language semantics. The frontend should check supported syntax, stratification, and rule shape before backend -lowering. - -For each DBSP-like experiment, also provide a snapshot oracle when feasible: - -```text -snapshot result == maintained result after each update -``` - -Track these measurements when relevant: - -- hydration time -- warm-update time -- output delta size -- maintained state size if available -- sensitivity to join order -- sensitivity to causal-history depth - -## CRDT Query Experiments - -Initial CRDT workloads should stay small and explicit: - -- multi-value register -- causal readiness over `pred` -- list next-element traversal -- tombstone skipping - -Use operation facts shaped like: - -```text -set(replica_id, counter, key, value) -pred(from_replica_id, from_counter, to_replica_id, to_counter) -insert(replica_id, counter, parent_replica_id, parent_counter, value) -remove(replica_id, counter) -``` - -Important questions: - -- Does the query require recursion, negation, or both? -- Can antijoins run earlier? -- Can causal readiness be maintained from a frontier? -- Does warm-update cost depend on history depth? -- Does the output need integration into a current view? - ## Geomerge-Style Validation Experiments The first Geomerge-style target is maintained violation detection for supported relational laws. @@ -219,8 +169,7 @@ Recommended test groups: - antijoin scheduling - SIP-style filtering - snapshot evaluation -- maintained-output equivalence -- CRDT fixtures +- storage-backend adapter parity (in-process, file-backed, and CRDT) - Geomerge-style violation fixtures Tests should prefer small facts with readable expected outputs. Avoid large benchmark fixtures unless the test is explicitly performance-oriented. @@ -239,6 +188,13 @@ For Rust changes, prefer: These map to `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all-targets --all-features`. If the project does not yet have a `Cargo.toml`, `make check` should still pass by skipping Rust-specific checks. +For changes that touch the cross-language pipeline (Haskell exporter and Rust runner), also run: + +1. `make export-fixtures`: rebuilds `crates/plan-runner/fixtures/*.json` from `tools/exporter/examples/*.scenario.json` using the Haskell exporter. + Requires the Nix dev shell (`make shell` or `nix develop`) so GHC and Cabal are available. +2. `make examples`: runs `export-fixtures` and then `cargo test -p plan-runner --test examples`, which walks every regenerated fixture and verifies it + against its `expected_bindings` oracle. + For Markdown-only changes, run a manual read-through and check that headings follow the writing style. ## Change Design Checklist diff --git a/README.md b/README.md index 3ff6bb4..1895ac1 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,13 @@ ## Storage Engine Playground -This repo is a playground for running small experiments related to storage side of things. +This repo is a playground for running small experiments related to storage and query execution. ### Development +> ⚠️ Clone with `--recursive`. +> The repo pulls `external/geolog` and `external/geomerge` as git submodules; +> a non-recursive clone leaves those directories empty and breaks the build. + ```sh # Clone the repo with submodules git clone --recursive git@code.obsidian.systems:habedi-work/storage-engine-playground.git diff --git a/crates/plan-runner/README.md b/crates/plan-runner/README.md index 5c9c387..f0f4ae3 100644 --- a/crates/plan-runner/README.md +++ b/crates/plan-runner/README.md @@ -48,6 +48,11 @@ via `build_tables_via_storage`, then scans tables back out before executing. | `sqlite` | `SqliteStorage` | fresh tempdir per run | | `geomerge` | `GeomergeStorage` | in-process | +> ⚠️ `--backend geomerge` requires a typed theory upfront, but the runner IR is untyped. +> The CLI infers column types (`PrimInt` or `PrimString`) from the first fact row per relation; +> relations with no facts default to `PrimString`. +> Works for every current fixture; future fixtures with mixed-type columns may fail at insert time. + ### Execute a Query Plan ```sh diff --git a/crates/storage/README.md b/crates/storage/README.md index c968acf..41f7092 100644 --- a/crates/storage/README.md +++ b/crates/storage/README.md @@ -106,7 +106,7 @@ cargo test -p storage --all-features - **Deletion support.** Most adapters implement `delete`. The `geomerge` adapter does not: its append-only commit log returns `StorageError::Unsupported("row deletion")`. -- **Geomerge is alpha.** +- ⚠️ **Geomerge is alpha.** The upstream `geomerge` crate is prototype-status and its API can change without notice; treat breakage in `adapters::geomerge` as expected churn rather than regression. - **Feature gates.** diff --git a/tools/exporter/README.md b/tools/exporter/README.md index 1dccef0..a320b1a 100644 --- a/tools/exporter/README.md +++ b/tools/exporter/README.md @@ -26,9 +26,10 @@ tools/exporter/ ### Run It -The exporter needs GHC 9.12 and Cabal. -The repository's Nix dev shell provides both; -enter it with `make shell` (or `nix develop`) before running the commands below. +> ⚠️ The exporter needs GHC 9.12 and Cabal. +> The repository's Nix dev shell provides both; +> enter it with `make shell` (or `nix develop`) before running the commands below. +> A system GHC older than 9.12 will fail to compile geolog-lang's `GHC2024` modules. ```sh # Build the executable: