query-engine/AGENTS.md

# AGENTS.md

This file provides guidance to coding agents collaborating on this repository.

## Mission

Query Engine is an experimental Rust project for building query-engine
components. The current implementation is centered on a chase-based reasoning
core, lightweight interactive frontends, and an early relational/SQL scaffold.

Priorities, in order:

1. Correctness of reasoning and query semantics.
2. Clear architectural boundaries between front-end, planning, and execution layers.
3. Termination guarantees for chase-based rule evaluation.
4. Performance and scalability.
5. Clear, maintainable, idiomatic Rust code.

## Core Rules

- Use English for code, comments, docs, and tests.
- Keep mutable state inside well-defined structs; avoid global mutable state.
- Prefer small, focused changes over large refactoring.
- Add comments only when they clarify non-obvious behavior.
- Follow Rust idioms: use `Result` for errors, iterators over manual loops, etc.
- Do not describe unimplemented subsystems as if they already exist.

Quick examples:

- Good: add a planning data type behind a focused module boundary.
- Good: add a new chase variant by extending the existing strategy/config model.
- Bad: mix parsing, planning, and execution concerns in one module.
- Bad: add global configuration that affects unrelated engine components.


## Writing Style

- Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
- Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
- Avoid colorful adjectives and adverbs. Write "TCP proxy" not "lightweight TCP proxy", "scoring components" not "transparent scoring components".
- Use noun phrases for checklist items, not imperative verbs. Write "redundant index detection" not "detect redundant indexes".
- Headings in Markdown files must be in the title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in,
  on, at, to, by, of) stay lowercase unless they are the first word.

## Repository Layout

- `src/`: core implementation.
- `src/chase/`: chase and rule-evaluation modules.
    - `term.rs`: terms (constants, nulls, variables).
    - `atom.rs`: atoms (predicate applied to terms).
    - `instance.rs`: fact storage and validation.
    - `rule.rs`: TGDs, EGDs, equalities, and builders.
    - `substitution.rs`: variable bindings and unification.
    - `engine.rs`: chase execution and configuration.
    - `inference.rs`: shared matching and provenance-aware materialization helpers.
    - `union_find.rs`: equality merging support.
- `src/frontend/`: lightweight interactive surface for scripts, REPL, and local web UI.
- `src/relational/`: schemas, values, rows, and result sets for relational execution.
- `src/catalog/`: predicate-to-table schema inference and catalog access.
- `src/sql/`: narrow SQL AST and parser support.
- `src/planner/`: logical plan structures and SQL-to-plan translation.
- `src/execution/`: execution of the current logical plan subset, including the `DataSource` trait, the `TableStore` in-memory source, and the physical operator layer in `physical.rs` with rule-based rewrites.
- `examples/scripts/`: runnable script examples for supported workflows.
- `tests/`: integration, regression, and property-based tests.

## Architecture Constraints

- Treat the current chase subsystem as one engine component, not the entire long-term architecture.
- `Instance` holds the fact state as ground atoms.
- `Rule` and `Egd` represent declarative constraints used by the chase subsystem.
- The chase engine should remain largely stateless; pass execution state explicitly.
- New chase variants should be composable with existing infrastructure.
- Existential variables generate labeled nulls (`Term::Null`).
- The current SQL support is intentionally narrow: `SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT` over predicate-backed tables; equality and inequality predicates combined with `AND` and `OR`; comma-join style multi-table queries; table aliases; ordering by output-column names; integer and string literals; `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregates with optional `GROUP BY`.
- Stable SQL column names come from explicit catalog registration or the frontend `schema ...` command, including for empty tables; otherwise the default names are positional such as `c0` and `c1`.
- Single-table SQL queries may use the table name as a qualifier when no alias is present.
- Do not describe unsupported SQL features such as aggregates, grouping, or arbitrary expressions as implemented.
- The executor operates on the `DataSource` trait, not on `Instance` directly. `Instance` and `TableStore` are the two built-in implementations.
- Relational and SQL modules should build on explicit schemas and logical plans, not call frontend helpers directly.
- If you add parser, planner, or executor layers, keep their responsibilities separate.
- Public docs and interfaces should reflect the implemented state of the repository accurately.

## Rust Conventions

- Target stable Rust (edition 2024, rust-version 1.92).
- Use `#[derive(...)]` for common traits where appropriate.
- Prefer `&str` over `String` in function parameters when ownership is not needed.
- Use `impl Trait` for return types when the concrete type is an implementation detail.
- Run `cargo clippy` and address warnings before committing.

## Required Validation

Run these checks for any non-trivial change:

1. `cargo test`
2. `cargo clippy --all-targets --all-features -- -D warnings`
3. `cargo fmt --check`

For performance-sensitive changes:

1. Add benchmarks if they do not exist.
2. Compare before/after performance.

## First Contribution Flow

Use this sequence for your first change:

1. Read `src/lib.rs` plus the relevant module files.
2. Implement the smallest possible code change.
3. Add or update tests that fail before and pass after.
4. Run `cargo test`.
5. Run `cargo clippy --all-targets --all-features -- -D warnings`.
6. Update docs if public API behavior changed.

Example scopes that are good first tasks:

- Add tests for an edge case in unification.
- Implement a new utility method on `Instance` or `Atom`.
- Tighten frontend wording so it matches actual behavior.
- Introduce a small planning-oriented type without changing execution semantics.
- Extend the SQL slice with a narrow, well-tested feature such as aliases or named columns.
- Add a runnable example script that demonstrates a supported workflow.

## Testing Expectations

- No semantics-changing logic update is complete without tests.
- Unit tests go in `#[cfg(test)] mod tests` within each module.
- Integration tests go in `tests/integration_tests.rs`.
- Regression tests for bug fixes go in `tests/regression_tests.rs`.
- Property-based tests go in `tests/property_tests.rs`.
- SQL/planner/execution flow tests go in `tests/sql_pipeline_tests.rs`.
- Runnable documentation examples belong in `examples/scripts/` when they clarify supported behavior.
- Do not merge code that breaks existing tests.

Minimal unit-test checklist for chase-related behavior:

1. Create an `Instance` with relevant facts.
2. Define rules using `RuleBuilder`.
3. Run `chase(instance, &rules)`.
4. Assert on `result.terminated`, `result.instance`, and derived facts.

Example test skeleton:

```rust
#[test]
fn test_example() {
    let instance: Instance = vec![
        Atom::new("Pred", vec![Term::constant("a")]),
    ].into_iter().collect();

    let rule = RuleBuilder::new()
        .when("Pred", vec![Term::var("X")])
        .then("Derived", vec![Term::var("X")])
        .build();

    let result = chase(instance, &[rule]);

    assert!(result.terminated);
    assert_eq!(result.instance.facts_for_predicate("Derived").len(), 1);
}
```

## Change Design Checklist

Before coding:

1. Confirm whether the change affects reasoning semantics, planning boundaries, or termination.
2. Identify affected tests.
3. Consider impact on API stability.
4. Avoid overstating roadmap progress in code comments or docs.
5. Keep the supported SQL subset explicit when touching `sql`, `planner`, or `execution`.

Before submitting:

1. Verify `cargo test` passes.
2. Verify `cargo clippy --all-targets --all-features -- -D warnings` passes.
3. Ensure tests were added or updated where relevant.
4. Verify docs still match the implemented feature set.

## Review Guidelines (P0/P1 Focus)

Review output should be concise and only include critical issues.

- `P0`: must-fix defects (incorrect reasoning, non-termination, unsound semantics).
- `P1`: high-priority defects (likely functional bug, performance regression, API breakage, misleading public behavior/docs).

Do not include:

- style-only nitpicks,
- praise/summary of what is already good,
- exhaustive restatement of the patch.

Use this review format:

1. `Severity` (`P0`/`P1`)
2. `File:line`
3. `Issue`
4. `Why it matters`
5. `Minimal fix direction`

## Practical Notes for Agents

- Prefer targeted edits over broad mechanical rewrites.
- If you detect contradictory repository conventions, follow existing code and update docs accordingly.
- When uncertain about correctness, add or extend tests first, then optimize.
- When adding non-chase engine pieces, define clean interfaces before broadening functionality.
- Keep `frontend` presentation-only when possible; shared reasoning logic belongs in `chase`, relational logic in `relational`/`planner`/`execution`.
- Keep user-facing naming consistent with the repository name: `query-engine` / `query_engine`.
- If you change the SQL subset, update `README.md`, `ROADMAP.md`, and relevant example scripts in the same change.

## Commit and PR Hygiene

- Keep commits scoped to one logical change.
- PR descriptions should include:
    1. behavioral change summary,
    2. tests added/updated,
    3. performance impact (if applicable),
    4. API changes (if any),
    5. roadmap or architecture impact (if applicable).

Suggested PR checklist:

- [ ] Tests added/updated for behavior changes
- [ ] `cargo test` passes
- [ ] `cargo clippy --all-targets --all-features -- -D warnings` passes
- [ ] `cargo fmt --check` passes