query-engine/README.md

7.5 KiB

Query Engine

An experimental Rust project for building query-engine components.

Right now the repository is centered on a chase-based reasoning core, a small interactive frontend, and an early relational/SQL scaffold. The broader target shape is a query engine with clearer front-end, planning, optimization, and execution boundaries.

Current scope

  • Chase-based rule evaluation over facts, rules, and substitutions
  • Restricted, standard, oblivious, and Skolem chase variants
  • Optional semi-naive evaluation across all chase variants
  • Provenance-oriented explanations for derived answers
  • Script, REPL, local web UI, and optional TUI for experimentation (all with syntax highlighting)
  • Relational schema, catalog, logical-plan, and execution scaffolding
  • Physical operator scaffolding with a small rule-based rewrite layer
  • A minimal SQL slice for SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT queries over predicate-backed tables, including COUNT, SUM, MIN, MAX, and AVG aggregates
  • Filter push-down across joins in the physical rewrite pass

Architecture

The repository is currently organized around a few clear subsystems:

  • src/chase/: rule-engine data structures, chase execution, and stratification
  • src/io/: CSV-based fact import/export
  • src/frontend/: REPL, script, GUI, and explanation rendering
  • src/relational/: schemas, values, rows, and result sets
  • src/catalog/: predicate-backed table metadata
  • src/sql/: minimal SQL AST and parser
  • src/planner/: logical plan structures and SQL-to-plan translation
  • src/execution/: execution for the current logical-plan subset, the DataSource trait, the TableStore, and a physical operator layer with rule-based rewrites

Today, the chase subsystem is still the most mature part of the codebase. The relational and SQL modules are present to create clean extension points for a broader query-engine architecture.

The executor operates on the DataSource trait rather than on the chase Instance directly. This allows non-chase data sources to plug into the SQL pipeline. The crate ships two implementations: Instance (chase-backed) and TableStore (in-memory rows). Implementing DataSource for a new backend requires a single method:

fn scan(&self, table: &str, schema: &Schema) -> Result<ResultSet, ExecutionError>;

Intended Direction

The medium-term direction is to evolve this project into a more general query-engine playground with:

  • explicit front-end and parsing layers
  • internal planning representations
  • clearer separation between logical meaning and execution strategy
  • support for multiple query-engine experiments instead of only chase logic

The current code now includes an initial SQL front end, logical plan, and execution path. It is still intentionally narrow and should not be read as full SQL support.

Quickstart

Rust API

use query_engine::{Atom, Instance, Term, chase};
use query_engine::chase::rule::RuleBuilder;

let instance: Instance = vec![
    Atom::new("Parent", vec![Term::constant("alice"), Term::constant("bob")]),
    Atom::new("Parent", vec![Term::constant("bob"), Term::constant("carol")]),
]
.into_iter()
.collect();

let rule1 = RuleBuilder::new()
    .when("Parent", vec![Term::var("X"), Term::var("Y")])
    .then("Ancestor", vec![Term::var("X"), Term::var("Y")])
    .build();

let rule2 = RuleBuilder::new()
    .when("Ancestor", vec![Term::var("X"), Term::var("Y")])
    .when("Parent", vec![Term::var("Y"), Term::var("Z")])
    .then("Ancestor", vec![Term::var("X"), Term::var("Z")])
    .build();

let result = chase(instance, &[rule1, rule2]);

assert!(result.terminated);
assert_eq!(result.instance.facts_for_predicate("Ancestor").len(), 3);

CLI

cargo run -- repl
cargo run -- gui
cargo run -- script examples/scripts/ancestor.ech
cargo run -- script examples/scripts/sql_join.ech
cargo run --features tui -- tui

REPL language

fact Parent(alice, bob).
rule Parent(?X, ?Y) -> Ancestor(?X, ?Y).
schema Parent(parent, child).
sql SELECT * FROM Parent;
run.
query Ancestor(?X, ?Y)?
explain Ancestor(alice, carol)?
show facts
show rules
reset
help

Current SQL Slice

The repository now has a narrow SQL pipeline with:

  • predicate-backed catalog inference
  • relational schemas, rows, and values
  • SQL parsing for a small subset
  • logical planning
  • execution for filtering, ordering, limiting, and basic multi-table joins

Currently supported examples:

SELECT * FROM Parent
SELECT c0 FROM Parent
SELECT c0 FROM Parent WHERE c1 = 'bob'
SELECT c0 FROM Parent WHERE c1 != 'bob'
SELECT c0 FROM Parent WHERE c1 = 'bob' AND c0 = 'alice'
SELECT c0 FROM Parent WHERE c1 = 'bob' OR c1 = 'carol'
SELECT c0 FROM Parent ORDER BY c0 DESC
SELECT c0 FROM Parent ORDER BY c0 ASC LIMIT 1
SELECT c0 AS parent_name, 'seed' AS label, 42 AS answer FROM Parent
SELECT Parent.parent, Ancestor.child
FROM Parent, Ancestor
WHERE Parent.child = Ancestor.parent
SELECT p.parent, q.child
FROM Parent AS p, Parent AS q
WHERE p.child = q.parent
SELECT COUNT(*) FROM Parent
SELECT dept, COUNT(*), SUM(salary) FROM Emp GROUP BY dept

In the REPL or script runner, use the sql command and end the statement with ;:

sql SELECT c0 FROM Parent WHERE c1 = 'bob';

fact, rule, schema, sql, query, and explain commands may also span multiple lines in .ech scripts as long as the final line ends with the normal terminator.

You can also register stable column names for a predicate-backed table in the frontend before running SQL, including tables that currently have no facts:

schema Parent(parent, child).
sql SELECT parent FROM Parent WHERE child = 'bob';

For multi-table queries, qualify column names with the table name:

schema Parent(parent, child).
schema Ancestor(parent, child).
sql SELECT Parent.parent, Ancestor.child FROM Parent, Ancestor WHERE Parent.child = Ancestor.parent;

For self-joins or shorter qualification, use table aliases:

schema Parent(parent, child).
sql SELECT p.parent, q.child FROM Parent AS p, Parent AS q WHERE p.child = q.parent;

Current limits:

  • default column names are positional such as c0, c1
  • stable names require explicit catalog registration or schema ... in the frontend
  • single-table queries may also use the table name as a qualifier when no alias is present
  • joins currently use comma-separated tables plus WHERE filtering
  • multi-table queries require qualified column names such as Parent.child
  • table aliases are supported via FROM Parent AS p
  • WHERE supports =, !=/<>, AND, and OR (with standard precedence)
  • ORDER BY supports output-column ordering with ASC/DESC
  • LIMIT restricts the number of output rows
  • literals include strings, integers, and NULL
  • aggregates: COUNT(*), COUNT(col), SUM, MIN, MAX, AVG, with optional GROUP BY
  • projection aliases only via AS

Runnable SQL examples:

  • examples/scripts/sql_basic.ech
  • examples/scripts/sql_join.ech
  • examples/scripts/sql_self_join.ech
  • examples/scripts/sql_order_by.ech
  • examples/scripts/sql_filter_ops.ech

Development

For non-trivial changes, run:

cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check

Benchmarks live under benches/ and can be run with:

cargo bench

Notes

This repository is still centered on a rule-engine core. The new SQL-related modules are scaffolding for a broader query-engine direction, not a claim of feature-complete SQL support.

License

This project is licensed under BSD-3.