query-engine/AGENTS.md

9.8 KiB

AGENTS.md

This file provides guidance to coding agents collaborating on this repository.

Mission

Query Engine is an experimental Rust project for building query-engine components. The current implementation is centered on a chase-based reasoning core, lightweight interactive frontends, and an early relational/SQL scaffold.

Priorities, in order:

  1. Correctness of reasoning and query semantics.
  2. Clear architectural boundaries between front-end, planning, and execution layers.
  3. Termination guarantees for chase-based rule evaluation.
  4. Performance and scalability.
  5. Clear, maintainable, idiomatic Rust code.

Core Rules

  • Use English for code, comments, docs, and tests.
  • Keep mutable state inside well-defined structs; avoid global mutable state.
  • Prefer small, focused changes over large refactoring.
  • Add comments only when they clarify non-obvious behavior.
  • Follow Rust idioms: use Result for errors, iterators over manual loops, etc.
  • Do not describe unimplemented subsystems as if they already exist.

Quick examples:

  • Good: add a planning data type behind a focused module boundary.
  • Good: add a new chase variant by extending the existing strategy/config model.
  • Bad: mix parsing, planning, and execution concerns in one module.
  • Bad: add global configuration that affects unrelated engine components.

Writing Style

  • Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
  • Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
  • Avoid colorful adjectives and adverbs. Write "TCP proxy" not "lightweight TCP proxy", "scoring components" not "transparent scoring components".
  • Use noun phrases for checklist items, not imperative verbs. Write "redundant index detection" not "detect redundant indexes".
  • Headings in Markdown files must be in the title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in, on, at, to, by, of) stay lowercase unless they are the first word.

Repository Layout

  • src/: core implementation.
  • src/chase/: chase and rule-evaluation modules.
    • term.rs: terms (constants, nulls, variables).
    • atom.rs: atoms (predicate applied to terms).
    • instance.rs: fact storage and validation.
    • rule.rs: TGDs, EGDs, equalities, and builders.
    • substitution.rs: variable bindings and unification.
    • engine.rs: chase execution and configuration.
    • inference.rs: shared matching and provenance-aware materialization helpers.
    • union_find.rs: equality merging support.
  • src/frontend/: lightweight interactive surface for scripts, REPL, and local web UI.
  • src/relational/: schemas, values, rows, and result sets for relational execution.
  • src/catalog/: predicate-to-table schema inference and catalog access.
  • src/sql/: narrow SQL AST and parser support.
  • src/planner/: logical plan structures and SQL-to-plan translation.
  • src/execution/: execution of the current logical plan subset, including the DataSource trait, the TableStore in-memory source, and the physical operator layer in physical.rs with rule-based rewrites.
  • examples/scripts/: runnable script examples for supported workflows.
  • tests/: integration, regression, and property-based tests.

Architecture Constraints

  • Treat the current chase subsystem as one engine component, not the entire long-term architecture.
  • Instance holds the fact state as ground atoms.
  • Rule and Egd represent declarative constraints used by the chase subsystem.
  • The chase engine should remain largely stateless; pass execution state explicitly.
  • New chase variants should be composable with existing infrastructure.
  • Existential variables generate labeled nulls (Term::Null).
  • The current SQL support is intentionally narrow: SELECT-FROM-WHERE-ORDER BY-LIMIT over predicate-backed tables; equality and inequality predicates combined with AND and OR; comma-join style multi-table queries; table aliases; ordering by output-column names; integer and string literals.
  • Stable SQL column names come from explicit catalog registration or the frontend schema ... command, including for empty tables; otherwise the default names are positional such as c0 and c1.
  • Single-table SQL queries may use the table name as a qualifier when no alias is present.
  • Do not describe unsupported SQL features such as aggregates, grouping, or arbitrary expressions as implemented.
  • The executor operates on the DataSource trait, not on Instance directly. Instance and TableStore are the two built-in implementations.
  • Relational and SQL modules should build on explicit schemas and logical plans, not call frontend helpers directly.
  • If you add parser, planner, or executor layers, keep their responsibilities separate.
  • Public docs and interfaces should reflect the implemented state of the repository accurately.

Rust Conventions

  • Target stable Rust (edition 2024, rust-version 1.92).
  • Use #[derive(...)] for common traits where appropriate.
  • Prefer &str over String in function parameters when ownership is not needed.
  • Use impl Trait for return types when the concrete type is an implementation detail.
  • Run cargo clippy and address warnings before committing.

Required Validation

Run these checks for any non-trivial change:

  1. cargo test
  2. cargo clippy --all-targets --all-features -- -D warnings
  3. cargo fmt --check

For performance-sensitive changes:

  1. Add benchmarks if they do not exist.
  2. Compare before/after performance.

First Contribution Flow

Use this sequence for your first change:

  1. Read src/lib.rs plus the relevant module files.
  2. Implement the smallest possible code change.
  3. Add or update tests that fail before and pass after.
  4. Run cargo test.
  5. Run cargo clippy --all-targets --all-features -- -D warnings.
  6. Update docs if public API behavior changed.

Example scopes that are good first tasks:

  • Add tests for an edge case in unification.
  • Implement a new utility method on Instance or Atom.
  • Tighten frontend wording so it matches actual behavior.
  • Introduce a small planning-oriented type without changing execution semantics.
  • Extend the SQL slice with a narrow, well-tested feature such as aliases or named columns.
  • Add a runnable example script that demonstrates a supported workflow.

Testing Expectations

  • No semantics-changing logic update is complete without tests.
  • Unit tests go in #[cfg(test)] mod tests within each module.
  • Integration tests go in tests/integration_tests.rs.
  • Regression tests for bug fixes go in tests/regression_tests.rs.
  • Property-based tests go in tests/property_tests.rs.
  • SQL/planner/execution flow tests go in tests/sql_pipeline_tests.rs.
  • Runnable documentation examples belong in examples/scripts/ when they clarify supported behavior.
  • Do not merge code that breaks existing tests.

Minimal unit-test checklist for chase-related behavior:

  1. Create an Instance with relevant facts.
  2. Define rules using RuleBuilder.
  3. Run chase(instance, &rules).
  4. Assert on result.terminated, result.instance, and derived facts.

Example test skeleton:

#[test]
fn test_example() {
    let instance: Instance = vec![
        Atom::new("Pred", vec![Term::constant("a")]),
    ].into_iter().collect();

    let rule = RuleBuilder::new()
        .when("Pred", vec![Term::var("X")])
        .then("Derived", vec![Term::var("X")])
        .build();

    let result = chase(instance, &[rule]);

    assert!(result.terminated);
    assert_eq!(result.instance.facts_for_predicate("Derived").len(), 1);
}

Change Design Checklist

Before coding:

  1. Confirm whether the change affects reasoning semantics, planning boundaries, or termination.
  2. Identify affected tests.
  3. Consider impact on API stability.
  4. Avoid overstating roadmap progress in code comments or docs.
  5. Keep the supported SQL subset explicit when touching sql, planner, or execution.

Before submitting:

  1. Verify cargo test passes.
  2. Verify cargo clippy --all-targets --all-features -- -D warnings passes.
  3. Ensure tests were added or updated where relevant.
  4. Verify docs still match the implemented feature set.

Review Guidelines (P0/P1 Focus)

Review output should be concise and only include critical issues.

  • P0: must-fix defects (incorrect reasoning, non-termination, unsound semantics).
  • P1: high-priority defects (likely functional bug, performance regression, API breakage, misleading public behavior/docs).

Do not include:

  • style-only nitpicks,
  • praise/summary of what is already good,
  • exhaustive restatement of the patch.

Use this review format:

  1. Severity (P0/P1)
  2. File:line
  3. Issue
  4. Why it matters
  5. Minimal fix direction

Practical Notes for Agents

  • Prefer targeted edits over broad mechanical rewrites.
  • If you detect contradictory repository conventions, follow existing code and update docs accordingly.
  • When uncertain about correctness, add or extend tests first, then optimize.
  • When adding non-chase engine pieces, define clean interfaces before broadening functionality.
  • Keep frontend presentation-only when possible; shared reasoning logic belongs in chase, relational logic in relational/planner/execution.
  • Keep user-facing naming consistent with the repository name: query-engine / query_engine.
  • If you change the SQL subset, update README.md, ROADMAP.md, and relevant example scripts in the same change.

Commit and PR Hygiene

  • Keep commits scoped to one logical change.
  • PR descriptions should include:
    1. behavioral change summary,
    2. tests added/updated,
    3. performance impact (if applicable),
    4. API changes (if any),
    5. roadmap or architecture impact (if applicable).

Suggested PR checklist:

  • Tests added/updated for behavior changes
  • cargo test passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • cargo fmt --check passes