Add a few emojies to README files

This commit is contained in:
Hassan Abedi 2026-06-05 13:51:26 +02:00
parent 6327c0e344
commit bd64b4ab89
5 changed files with 46 additions and 80 deletions

108
AGENTS.md
View File

@ -4,15 +4,15 @@ This file provides guidance to coding agents collaborating on this repository.
## Mission
`storage-engine-playground` is an experimental Rust project for testing ideas from the FlowLog, DBSP, CRDT-as-query, and Geomerge notes.
`storage-engine-playground` is an experimental Rust project for prototyping query engines and storage engines.
The goal is not production software. The goal is a clear, runnable playground for small prototypes that help answer concrete architecture questions:
- how Datalog-like rules should be parsed, cataloged, planned, and optimized
- how FlowLog-style planning ideas transfer to a DBSP-oriented frontend
- how CRDT queries behave under naive plans versus planned relational execution
- how Geomerge-style laws can compile into maintained violation relations
- how backend behavior changes across snapshot, DBSP-like, and Differential Dataflow-like execution models
- how a query language should be parsed, cataloged, planned, and optimized
- how a query planner and a query executor should be separated, and what intermediate representation sits between them
- how a query executor's operators (scans, joins, antijoins, projections) compose into a working snapshot evaluator
- how a storage engine should expose a backend-neutral interface (relations, rows, transactions, scans), and how that interface holds up across
different backends (in-process, file-backed, CRDT, and so on)
Priorities, in order:
@ -44,19 +44,23 @@ Priorities, in order:
## Repository Layout
The repository is new and may change. Discover the current layout from the filesystem before editing.
Discover the current layout from the filesystem before editing.
The shape today is:
Expected durable areas may include:
- `crates/`: Rust workspace.
See [`crates/README.md`](crates/README.md) for the responsibilities and dependency edges between the four crates (`storage`, `query-ops`,
`plan-runner`, `geomerge-demo`).
Each crate keeps its own `src/`, `tests/`, and (where relevant) `fixtures/`, `benches/`, and `docs/diagrams/` subdirectories.
- `tools/exporter/`: Haskell tool that consumes hand-authored `.scenario.json` files in `tools/exporter/examples/` and emits the runner-IR JSON
consumed by `crates/plan-runner`.
See [`tools/exporter/README.md`](tools/exporter/README.md).
- `external/`: git submodules.
`external/geolog` provides the Haskell query planner used by the exporter; `external/geomerge` is the Rust CRDT crate consumed by
`storage::adapters::geomerge`.
- Top-level configuration: `Makefile`, `flake.nix`, `Cargo.toml` (workspace), `pyproject.toml`, `.pre-commit-config.yaml`, `rust-toolchain.toml`.
- `src/`: Rust source for parser, catalog, planner, execution experiments, and storage prototypes.
- `tests/`: integration tests for rule planning, evaluation, and storage behavior.
- `tools/exporter/examples/`: hand-authored scenario JSON consumed by the Haskell exporter to produce runner fixtures.
- `fixtures/`: committed input facts and expected outputs.
- `notes/`: local design notes that belong to this project.
- `flowlog/`: project-local notes or sketches derived from the FlowLog line of work.
Do not assume this list is exhaustive. If the project grows a different structure, follow the actual codebase and update this file when conventions
stabilize.
Do not assume this list is exhaustive.
If the project grows a different structure, follow the actual codebase and update this file when conventions stabilize.
## Technical Direction
@ -70,15 +74,15 @@ Datalog-like rules or Geolog-shaped laws
-> relational plan
-> FlowLog-style optimization
-> backend lowering
-> maintained or snapshot outputs
-> snapshot outputs
```
Keep these layers explicit:
- **Source Layer**: Datalog-like test programs, CRDT query definitions, and Geomerge-style laws.
- **Source Layer**: Datalog-like test programs and Geomerge-style laws.
- **Catalog Layer**: rule heads, body atoms, variables, constants, comparisons, negation, and projections.
- **Planning Layer**: join graphs, join order, antijoin placement, SIP-style filtering, subplan sharing, and physical key choice.
- **Execution Layer**: snapshot evaluator first, then DBSP-like or Differential Dataflow-like experiments.
- **Execution Layer**: snapshot evaluator.
- **Storage Layer**: facts, transactions, rollback, preview state, and violation output integration.
## FlowLog-Inspired Planning
@ -106,60 +110,6 @@ rule with three positive atoms
-> expected textual plan
```
## DBSP and Incremental Execution
DBSP-related work should preserve a clean boundary:
```text
planned relational IR
-> DBSP lowering
-> maintained output deltas
```
Do not make DBSP responsible for source-language semantics. The frontend should check supported syntax, stratification, and rule shape before backend
lowering.
For each DBSP-like experiment, also provide a snapshot oracle when feasible:
```text
snapshot result == maintained result after each update
```
Track these measurements when relevant:
- hydration time
- warm-update time
- output delta size
- maintained state size if available
- sensitivity to join order
- sensitivity to causal-history depth
## CRDT Query Experiments
Initial CRDT workloads should stay small and explicit:
- multi-value register
- causal readiness over `pred`
- list next-element traversal
- tombstone skipping
Use operation facts shaped like:
```text
set(replica_id, counter, key, value)
pred(from_replica_id, from_counter, to_replica_id, to_counter)
insert(replica_id, counter, parent_replica_id, parent_counter, value)
remove(replica_id, counter)
```
Important questions:
- Does the query require recursion, negation, or both?
- Can antijoins run earlier?
- Can causal readiness be maintained from a frontier?
- Does warm-update cost depend on history depth?
- Does the output need integration into a current view?
## Geomerge-Style Validation Experiments
The first Geomerge-style target is maintained violation detection for supported relational laws.
@ -219,8 +169,7 @@ Recommended test groups:
- antijoin scheduling
- SIP-style filtering
- snapshot evaluation
- maintained-output equivalence
- CRDT fixtures
- storage-backend adapter parity (in-process, file-backed, and CRDT)
- Geomerge-style violation fixtures
Tests should prefer small facts with readable expected outputs. Avoid large benchmark fixtures unless the test is explicitly performance-oriented.
@ -239,6 +188,13 @@ For Rust changes, prefer:
These map to `cargo fmt --all --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all-targets --all-features`.
If the project does not yet have a `Cargo.toml`, `make check` should still pass by skipping Rust-specific checks.
For changes that touch the cross-language pipeline (Haskell exporter and Rust runner), also run:
1. `make export-fixtures`: rebuilds `crates/plan-runner/fixtures/*.json` from `tools/exporter/examples/*.scenario.json` using the Haskell exporter.
Requires the Nix dev shell (`make shell` or `nix develop`) so GHC and Cabal are available.
2. `make examples`: runs `export-fixtures` and then `cargo test -p plan-runner --test examples`, which walks every regenerated fixture and verifies it
against its `expected_bindings` oracle.
For Markdown-only changes, run a manual read-through and check that headings follow the writing style.
## Change Design Checklist

View File

@ -4,6 +4,10 @@ This repo is a playground for running small experiments related to storage side
### Development
> ⚠️ Clone with `--recursive`.
> The repo pulls `external/geolog` and `external/geomerge` as git submodules;
> a non-recursive clone leaves those directories empty and breaks the build.
```sh
# Clone the repo with submodules
git clone --recursive git@code.obsidian.systems:habedi-work/storage-engine-playground.git

View File

@ -48,6 +48,11 @@ via `build_tables_via_storage`, then scans tables back out before executing.
| `sqlite` | `SqliteStorage` | fresh tempdir per run |
| `geomerge` | `GeomergeStorage` | in-process |
> ⚠️ `--backend geomerge` requires a typed theory upfront, but the runner IR is untyped.
> The CLI infers column types (`PrimInt` or `PrimString`) from the first fact row per relation;
> relations with no facts default to `PrimString`.
> Works for every current fixture; future fixtures with mixed-type columns may fail at insert time.
### Execute a Query Plan
```sh

View File

@ -106,7 +106,7 @@ cargo test -p storage --all-features
- **Deletion support.**
Most adapters implement `delete`.
The `geomerge` adapter does not: its append-only commit log returns `StorageError::Unsupported("row deletion")`.
- **Geomerge is alpha.**
- ⚠️ **Geomerge is alpha.**
The upstream `geomerge` crate is prototype-status and its API can change without notice; treat breakage in `adapters::geomerge` as expected churn
rather than regression.
- **Feature gates.**

View File

@ -26,9 +26,10 @@ tools/exporter/
### Run It
The exporter needs GHC 9.12 and Cabal.
The repository's Nix dev shell provides both;
enter it with `make shell` (or `nix develop`) before running the commands below.
> ⚠️ The exporter needs GHC 9.12 and Cabal.
> The repository's Nix dev shell provides both;
> enter it with `make shell` (or `nix develop`) before running the commands below.
> A system GHC older than 9.12 will fail to compile geolog-lang's `GHC2024` modules.
```sh
# Build the executable: