227 lines
9.9 KiB
Markdown
227 lines
9.9 KiB
Markdown
# AGENTS.md
|
|
|
|
This file provides guidance to coding agents collaborating on this repository.
|
|
|
|
## Mission
|
|
|
|
Query Engine is an experimental Rust project for building query-engine
|
|
components. The current implementation is centered on a chase-based reasoning
|
|
core, lightweight interactive frontends, and an early relational/SQL scaffold.
|
|
|
|
Priorities, in order:
|
|
|
|
1. Correctness of reasoning and query semantics.
|
|
2. Clear architectural boundaries between front-end, planning, and execution layers.
|
|
3. Termination guarantees for chase-based rule evaluation.
|
|
4. Performance and scalability.
|
|
5. Clear, maintainable, idiomatic Rust code.
|
|
|
|
## Core Rules
|
|
|
|
- Use English for code, comments, docs, and tests.
|
|
- Keep mutable state inside well-defined structs; avoid global mutable state.
|
|
- Prefer small, focused changes over large refactoring.
|
|
- Add comments only when they clarify non-obvious behavior.
|
|
- Follow Rust idioms: use `Result` for errors, iterators over manual loops, etc.
|
|
- Do not describe unimplemented subsystems as if they already exist.
|
|
|
|
Quick examples:
|
|
|
|
- Good: add a planning data type behind a focused module boundary.
|
|
- Good: add a new chase variant by extending the existing strategy/config model.
|
|
- Bad: mix parsing, planning, and execution concerns in one module.
|
|
- Bad: add global configuration that affects unrelated engine components.
|
|
|
|
|
|
## Writing Style
|
|
|
|
- Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
|
|
- Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
|
|
- Avoid colorful adjectives and adverbs. Write "TCP proxy" not "lightweight TCP proxy", "scoring components" not "transparent scoring components".
|
|
- Use noun phrases for checklist items, not imperative verbs. Write "redundant index detection" not "detect redundant indexes".
|
|
- Headings in Markdown files must be in the title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in,
|
|
on, at, to, by, of) stay lowercase unless they are the first word.
|
|
|
|
## Repository Layout
|
|
|
|
- `src/`: core implementation.
|
|
- `src/chase/`: chase and rule-evaluation modules.
|
|
- `term.rs`: terms (constants, nulls, variables).
|
|
- `atom.rs`: atoms (predicate applied to terms).
|
|
- `instance.rs`: fact storage and validation.
|
|
- `rule.rs`: TGDs, EGDs, equalities, and builders.
|
|
- `substitution.rs`: variable bindings and unification.
|
|
- `engine.rs`: chase execution and configuration.
|
|
- `inference.rs`: shared matching and provenance-aware materialization helpers.
|
|
- `union_find.rs`: equality merging support.
|
|
- `src/frontend/`: lightweight interactive surface for scripts, REPL, and local web UI.
|
|
- `src/relational/`: schemas, values, rows, and result sets for relational execution.
|
|
- `src/catalog/`: predicate-to-table schema inference and catalog access.
|
|
- `src/sql/`: narrow SQL AST and parser support.
|
|
- `src/planner/`: logical plan structures and SQL-to-plan translation.
|
|
- `src/execution/`: execution of the current logical plan subset, including the `DataSource` trait, the `TableStore` in-memory source, and the physical operator layer in `physical.rs` with rule-based rewrites.
|
|
- `examples/scripts/`: runnable script examples for supported workflows.
|
|
- `tests/`: integration, regression, and property-based tests.
|
|
|
|
## Architecture Constraints
|
|
|
|
- Treat the current chase subsystem as one engine component, not the entire long-term architecture.
|
|
- `Instance` holds the fact state as ground atoms.
|
|
- `Rule` and `Egd` represent declarative constraints used by the chase subsystem.
|
|
- The chase engine should remain largely stateless; pass execution state explicitly.
|
|
- New chase variants should be composable with existing infrastructure.
|
|
- Existential variables generate labeled nulls (`Term::Null`).
|
|
- The current SQL support is intentionally narrow: `SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT` over predicate-backed tables; equality and inequality predicates combined with `AND` and `OR`; comma-join style multi-table queries; table aliases; ordering by output-column names; integer and string literals; `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregates with optional `GROUP BY`.
|
|
- Stable SQL column names come from explicit catalog registration or the frontend `schema ...` command, including for empty tables; otherwise the default names are positional such as `c0` and `c1`.
|
|
- Single-table SQL queries may use the table name as a qualifier when no alias is present.
|
|
- Do not describe unsupported SQL features such as aggregates, grouping, or arbitrary expressions as implemented.
|
|
- The executor operates on the `DataSource` trait, not on `Instance` directly. `Instance` and `TableStore` are the two built-in implementations.
|
|
- Relational and SQL modules should build on explicit schemas and logical plans, not call frontend helpers directly.
|
|
- If you add parser, planner, or executor layers, keep their responsibilities separate.
|
|
- Public docs and interfaces should reflect the implemented state of the repository accurately.
|
|
|
|
## Rust Conventions
|
|
|
|
- Target stable Rust (edition 2024, rust-version 1.92).
|
|
- Use `#[derive(...)]` for common traits where appropriate.
|
|
- Prefer `&str` over `String` in function parameters when ownership is not needed.
|
|
- Use `impl Trait` for return types when the concrete type is an implementation detail.
|
|
- Run `cargo clippy` and address warnings before committing.
|
|
|
|
## Required Validation
|
|
|
|
Run these checks for any non-trivial change:
|
|
|
|
1. `cargo test`
|
|
2. `cargo clippy --all-targets --all-features -- -D warnings`
|
|
3. `cargo fmt --check`
|
|
|
|
For performance-sensitive changes:
|
|
|
|
1. Add benchmarks if they do not exist.
|
|
2. Compare before/after performance.
|
|
|
|
## First Contribution Flow
|
|
|
|
Use this sequence for your first change:
|
|
|
|
1. Read `src/lib.rs` plus the relevant module files.
|
|
2. Implement the smallest possible code change.
|
|
3. Add or update tests that fail before and pass after.
|
|
4. Run `cargo test`.
|
|
5. Run `cargo clippy --all-targets --all-features -- -D warnings`.
|
|
6. Update docs if public API behavior changed.
|
|
|
|
Example scopes that are good first tasks:
|
|
|
|
- Add tests for an edge case in unification.
|
|
- Implement a new utility method on `Instance` or `Atom`.
|
|
- Tighten frontend wording so it matches actual behavior.
|
|
- Introduce a small planning-oriented type without changing execution semantics.
|
|
- Extend the SQL slice with a narrow, well-tested feature such as aliases or named columns.
|
|
- Add a runnable example script that demonstrates a supported workflow.
|
|
|
|
## Testing Expectations
|
|
|
|
- No semantics-changing logic update is complete without tests.
|
|
- Unit tests go in `#[cfg(test)] mod tests` within each module.
|
|
- Integration tests go in `tests/integration_tests.rs`.
|
|
- Regression tests for bug fixes go in `tests/regression_tests.rs`.
|
|
- Property-based tests go in `tests/property_tests.rs`.
|
|
- SQL/planner/execution flow tests go in `tests/sql_pipeline_tests.rs`.
|
|
- Runnable documentation examples belong in `examples/scripts/` when they clarify supported behavior.
|
|
- Do not merge code that breaks existing tests.
|
|
|
|
Minimal unit-test checklist for chase-related behavior:
|
|
|
|
1. Create an `Instance` with relevant facts.
|
|
2. Define rules using `RuleBuilder`.
|
|
3. Run `chase(instance, &rules)`.
|
|
4. Assert on `result.terminated`, `result.instance`, and derived facts.
|
|
|
|
Example test skeleton:
|
|
|
|
```rust
|
|
#[test]
|
|
fn test_example() {
|
|
let instance: Instance = vec![
|
|
Atom::new("Pred", vec![Term::constant("a")]),
|
|
].into_iter().collect();
|
|
|
|
let rule = RuleBuilder::new()
|
|
.when("Pred", vec![Term::var("X")])
|
|
.then("Derived", vec![Term::var("X")])
|
|
.build();
|
|
|
|
let result = chase(instance, &[rule]);
|
|
|
|
assert!(result.terminated);
|
|
assert_eq!(result.instance.facts_for_predicate("Derived").len(), 1);
|
|
}
|
|
```
|
|
|
|
## Change Design Checklist
|
|
|
|
Before coding:
|
|
|
|
1. Confirm whether the change affects reasoning semantics, planning boundaries, or termination.
|
|
2. Identify affected tests.
|
|
3. Consider impact on API stability.
|
|
4. Avoid overstating roadmap progress in code comments or docs.
|
|
5. Keep the supported SQL subset explicit when touching `sql`, `planner`, or `execution`.
|
|
|
|
Before submitting:
|
|
|
|
1. Verify `cargo test` passes.
|
|
2. Verify `cargo clippy --all-targets --all-features -- -D warnings` passes.
|
|
3. Ensure tests were added or updated where relevant.
|
|
4. Verify docs still match the implemented feature set.
|
|
|
|
## Review Guidelines (P0/P1 Focus)
|
|
|
|
Review output should be concise and only include critical issues.
|
|
|
|
- `P0`: must-fix defects (incorrect reasoning, non-termination, unsound semantics).
|
|
- `P1`: high-priority defects (likely functional bug, performance regression, API breakage, misleading public behavior/docs).
|
|
|
|
Do not include:
|
|
|
|
- style-only nitpicks,
|
|
- praise/summary of what is already good,
|
|
- exhaustive restatement of the patch.
|
|
|
|
Use this review format:
|
|
|
|
1. `Severity` (`P0`/`P1`)
|
|
2. `File:line`
|
|
3. `Issue`
|
|
4. `Why it matters`
|
|
5. `Minimal fix direction`
|
|
|
|
## Practical Notes for Agents
|
|
|
|
- Prefer targeted edits over broad mechanical rewrites.
|
|
- If you detect contradictory repository conventions, follow existing code and update docs accordingly.
|
|
- When uncertain about correctness, add or extend tests first, then optimize.
|
|
- When adding non-chase engine pieces, define clean interfaces before broadening functionality.
|
|
- Keep `frontend` presentation-only when possible; shared reasoning logic belongs in `chase`, relational logic in `relational`/`planner`/`execution`.
|
|
- Keep user-facing naming consistent with the repository name: `query-engine` / `query_engine`.
|
|
- If you change the SQL subset, update `README.md`, `ROADMAP.md`, and relevant example scripts in the same change.
|
|
|
|
## Commit and PR Hygiene
|
|
|
|
- Keep commits scoped to one logical change.
|
|
- PR descriptions should include:
|
|
1. behavioral change summary,
|
|
2. tests added/updated,
|
|
3. performance impact (if applicable),
|
|
4. API changes (if any),
|
|
5. roadmap or architecture impact (if applicable).
|
|
|
|
Suggested PR checklist:
|
|
|
|
- [ ] Tests added/updated for behavior changes
|
|
- [ ] `cargo test` passes
|
|
- [ ] `cargo clippy --all-targets --all-features -- -D warnings` passes
|
|
- [ ] `cargo fmt --check` passes
|