## Query Engine An experimental Rust project for building query-engine components. Right now the repository is centered on a chase-based reasoning core, an interactive frontend, and an early relational/SQL scaffold. The broader target shape is a query engine with clearer front-end, planning, optimization, and execution boundaries. ### Current scope - Chase-based rule evaluation over facts, rules, and substitutions - Restricted, standard, oblivious, and Skolem chase variants - Optional semi-naive evaluation across all chase variants - Provenance-oriented explanations for derived answers - Script, REPL, local web UI, and optional TUI for experimentation (all with syntax highlighting) - Relational schema, catalog, logical-plan, and execution scaffolding - Physical operator scaffolding with a rule-based rewrite layer - A SQL slice for `SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT` queries over predicate-backed tables, including `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregates - Filter push-down across joins in the physical rewrite pass ### Architecture The repository is currently organized around a few clear subsystems: - `src/chase/`: rule-engine data structures, chase execution, and stratification - `src/io/`: CSV-based fact import/export - `src/frontend/`: REPL, script, GUI, and explanation rendering - `src/relational/`: schemas, values, rows, and result sets - `src/catalog/`: predicate-backed table metadata - `src/sql/`: SQL AST and parser - `src/planner/`: logical plan structures and SQL-to-plan translation - `src/execution/`: execution for the current logical-plan subset, the `DataSource` trait, the `TableStore`, and a physical operator layer with rule-based rewrites Today, the chase subsystem is still the most mature part of the codebase. The relational and SQL modules are present to create clean extension points for a broader query-engine architecture. The executor operates on the `DataSource` trait rather than on the chase `Instance` directly. This allows non-chase data sources to plug into the SQL pipeline. The crate ships two implementations: `Instance` (chase-backed) and `TableStore` (in-memory rows). Implementing `DataSource` for a new backend requires a single method: ```rust fn scan(&self, table: &str, schema: &Schema) -> Result; ``` ### Intended Direction The medium-term direction is to evolve this project into a more general query-engine playground with: - explicit front-end and parsing layers - internal planning representations - clearer separation between logical meaning and execution strategy - support for multiple query-engine experiments instead of only chase logic The current code now includes an initial SQL front end, logical plan, and execution path. It is still intentionally narrow and should not be read as full SQL support. ### Quickstart #### Rust API ```rust use query_engine::{Atom, Instance, Term, chase}; use query_engine::chase::rule::RuleBuilder; let instance: Instance = vec![ Atom::new("Parent", vec![Term::constant("alice"), Term::constant("bob")]), Atom::new("Parent", vec![Term::constant("bob"), Term::constant("carol")]), ] .into_iter() .collect(); let rule1 = RuleBuilder::new() .when("Parent", vec![Term::var("X"), Term::var("Y")]) .then("Ancestor", vec![Term::var("X"), Term::var("Y")]) .build(); let rule2 = RuleBuilder::new() .when("Ancestor", vec![Term::var("X"), Term::var("Y")]) .when("Parent", vec![Term::var("Y"), Term::var("Z")]) .then("Ancestor", vec![Term::var("X"), Term::var("Z")]) .build(); let result = chase(instance, &[rule1, rule2]); assert!(result.terminated); assert_eq!(result.instance.facts_for_predicate("Ancestor").len(), 3); ``` #### CLI ```bash cargo run -- repl cargo run -- gui cargo run -- script examples/scripts/ancestor.ech cargo run -- script examples/scripts/sql_join.ech cargo run --features tui -- tui ``` #### REPL Language ```text fact Parent(alice, bob). rule Parent(?X, ?Y) -> Ancestor(?X, ?Y). rule Node(?X), NOT Connected(?X) -> Isolated(?X). schema Parent(parent, child). sql SELECT * FROM Parent; run. query Ancestor(?X, ?Y)? explain Ancestor(alice, carol)? set chase skolem set semi-naive on load /path/to/csv/dir save /path/to/csv/dir source examples/scripts/ancestor.ech show facts show rules reset help ``` #### Current SQL Slice The repository now has a narrow SQL pipeline with: - predicate-backed catalog inference - relational schemas, rows, and values - SQL parsing for the supported subset - logical planning - execution for filtering, ordering, limiting, and basic multi-table joins Currently supported examples: ```sql SELECT * FROM Parent SELECT c0 FROM Parent SELECT c0 FROM Parent WHERE c1 = 'bob' SELECT c0 FROM Parent WHERE c1 != 'bob' SELECT c0 FROM Parent WHERE c1 = 'bob' AND c0 = 'alice' SELECT c0 FROM Parent WHERE c1 = 'bob' OR c1 = 'carol' SELECT c0 FROM Parent ORDER BY c0 DESC SELECT c0 FROM Parent ORDER BY c0 ASC LIMIT 1 SELECT c0 AS parent_name, 'seed' AS label, 42 AS answer FROM Parent SELECT Parent.parent, Ancestor.child FROM Parent, Ancestor WHERE Parent.child = Ancestor.parent SELECT p.parent, q.child FROM Parent AS p, Parent AS q WHERE p.child = q.parent SELECT COUNT(*) FROM Parent SELECT dept, COUNT(*), SUM(salary) FROM Emp GROUP BY dept ``` In the REPL or script runner, use the `sql` command and end the statement with `;`: ```text sql SELECT c0 FROM Parent WHERE c1 = 'bob'; ``` `fact`, `rule`, `schema`, `sql`, `query`, and `explain` commands may also span multiple lines in `.ech` scripts as long as the final line ends with the normal terminator. You can also register stable column names for a predicate-backed table in the frontend before running SQL, including tables that currently have no facts: ```text schema Parent(parent, child). sql SELECT parent FROM Parent WHERE child = 'bob'; ``` For multi-table queries, qualify column names with the table name: ```text schema Parent(parent, child). schema Ancestor(parent, child). sql SELECT Parent.parent, Ancestor.child FROM Parent, Ancestor WHERE Parent.child = Ancestor.parent; ``` For self-joins or shorter qualification, use table aliases: ```text schema Parent(parent, child). sql SELECT p.parent, q.child FROM Parent AS p, Parent AS q WHERE p.child = q.parent; ``` Current limits: - default column names are positional such as `c0`, `c1` - stable names require explicit catalog registration or `schema ...` in the frontend - single-table queries may also use the table name as a qualifier when no alias is present - joins currently use comma-separated tables plus `WHERE` filtering - multi-table queries require qualified column names such as `Parent.child` - table aliases are supported via `FROM Parent AS p` - `WHERE` supports `=`, `!=`/`<>`, `AND`, and `OR` (with standard precedence) - `ORDER BY` supports output-column ordering with `ASC`/`DESC` - `LIMIT` restricts the number of output rows - literals include strings, integers, and `NULL` - aggregates: `COUNT(*)`, `COUNT(col)`, `SUM`, `MIN`, `MAX`, `AVG`, with optional `GROUP BY` - projection aliases only via `AS` Runnable SQL examples: - `examples/scripts/sql_basic.ech` - `examples/scripts/sql_join.ech` - `examples/scripts/sql_self_join.ech` - `examples/scripts/sql_order_by.ech` - `examples/scripts/sql_filter_ops.ech` ### Development For non-trivial changes, run: ```bash cargo test cargo clippy --all-targets --all-features -- -D warnings cargo fmt --check ``` Benchmarks live under `benches/` and can be run with: ```bash cargo bench ``` ### Notes This repository is still centered on a rule-engine core. The new SQL-related modules are scaffolding for a broader query-engine direction, not a claim of feature-complete SQL support. ### License This project is licensed under [BSD-3](LICENSE).