query-engine/README.md

234 lines
7.4 KiB
Markdown

## Query Engine
An experimental Rust project for building query-engine components.
Right now the repository is centered on a chase-based reasoning core, a small
interactive frontend, and an early relational/SQL scaffold. The broader target
shape is a query engine with clearer front-end, planning, optimization, and
execution boundaries.
### Current scope
- Chase-based rule evaluation over facts, rules, and substitutions
- Restricted, standard, oblivious, and Skolem chase variants
- Optional semi-naive evaluation across all chase variants
- Provenance-oriented explanations for derived answers
- Script, REPL, and local web UI for experimentation
- Relational schema, catalog, logical-plan, and execution scaffolding
- Physical operator scaffolding with a small rule-based rewrite layer
- A minimal SQL slice for `SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT` queries over predicate-backed tables, including `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregates
- Filter push-down across joins in the physical rewrite pass
### Architecture
The repository is currently organized around a few clear subsystems:
- `src/chase/`: rule-engine data structures, chase execution, and stratification
- `src/io/`: CSV-based fact import/export
- `src/frontend/`: REPL, script, GUI, and explanation rendering
- `src/relational/`: schemas, values, rows, and result sets
- `src/catalog/`: predicate-backed table metadata
- `src/sql/`: minimal SQL AST and parser
- `src/planner/`: logical plan structures and SQL-to-plan translation
- `src/execution/`: execution for the current logical-plan subset, the `DataSource` trait, the `TableStore`, and a physical operator layer with rule-based rewrites
Today, the chase subsystem is still the most mature part of the codebase. The
relational and SQL modules are present to create clean extension points for a
broader query-engine architecture.
The executor operates on the `DataSource` trait rather than on the chase
`Instance` directly. This allows non-chase data sources to plug into the SQL
pipeline. The crate ships two implementations: `Instance` (chase-backed) and
`TableStore` (in-memory rows). Implementing `DataSource` for a new backend
requires a single method:
```rust
fn scan(&self, table: &str, schema: &Schema) -> Result<ResultSet, ExecutionError>;
```
### Intended Direction
The medium-term direction is to evolve this project into a more general
query-engine playground with:
- explicit front-end and parsing layers
- internal planning representations
- clearer separation between logical meaning and execution strategy
- support for multiple query-engine experiments instead of only chase logic
The current code now includes an initial SQL front end, logical plan, and
execution path. It is still intentionally narrow and should not be read as full
SQL support.
### Quickstart
#### Rust API
```rust
use query_engine::{Atom, Instance, Term, chase};
use query_engine::chase::rule::RuleBuilder;
let instance: Instance = vec![
Atom::new("Parent", vec![Term::constant("alice"), Term::constant("bob")]),
Atom::new("Parent", vec![Term::constant("bob"), Term::constant("carol")]),
]
.into_iter()
.collect();
let rule1 = RuleBuilder::new()
.when("Parent", vec![Term::var("X"), Term::var("Y")])
.then("Ancestor", vec![Term::var("X"), Term::var("Y")])
.build();
let rule2 = RuleBuilder::new()
.when("Ancestor", vec![Term::var("X"), Term::var("Y")])
.when("Parent", vec![Term::var("Y"), Term::var("Z")])
.then("Ancestor", vec![Term::var("X"), Term::var("Z")])
.build();
let result = chase(instance, &[rule1, rule2]);
assert!(result.terminated);
assert_eq!(result.instance.facts_for_predicate("Ancestor").len(), 3);
```
#### CLI
```bash
cargo run -- repl
cargo run -- gui
cargo run -- script examples/scripts/ancestor.ech
cargo run -- script examples/scripts/sql_join.ech
```
#### REPL language
```text
fact Parent(alice, bob).
rule Parent(?X, ?Y) -> Ancestor(?X, ?Y).
schema Parent(parent, child).
sql SELECT * FROM Parent;
run.
query Ancestor(?X, ?Y)?
explain Ancestor(alice, carol)?
show facts
show rules
reset
help
```
#### Current SQL Slice
The repository now has a narrow SQL pipeline with:
- predicate-backed catalog inference
- relational schemas, rows, and values
- SQL parsing for a small subset
- logical planning
- execution for filtering, ordering, limiting, and basic multi-table joins
Currently supported examples:
```sql
SELECT * FROM Parent
SELECT c0 FROM Parent
SELECT c0 FROM Parent WHERE c1 = 'bob'
SELECT c0 FROM Parent WHERE c1 != 'bob'
SELECT c0 FROM Parent WHERE c1 = 'bob' AND c0 = 'alice'
SELECT c0 FROM Parent WHERE c1 = 'bob' OR c1 = 'carol'
SELECT c0 FROM Parent ORDER BY c0 DESC
SELECT c0 FROM Parent ORDER BY c0 ASC LIMIT 1
SELECT c0 AS parent_name, 'seed' AS label, 42 AS answer FROM Parent
SELECT Parent.parent, Ancestor.child
FROM Parent, Ancestor
WHERE Parent.child = Ancestor.parent
SELECT p.parent, q.child
FROM Parent AS p, Parent AS q
WHERE p.child = q.parent
SELECT COUNT(*) FROM Parent
SELECT dept, COUNT(*), SUM(salary) FROM Emp GROUP BY dept
```
In the REPL or script runner, use the `sql` command and end the statement with
`;`:
```text
sql SELECT c0 FROM Parent WHERE c1 = 'bob';
```
`fact`, `rule`, `schema`, `sql`, `query`, and `explain` commands may also span
multiple lines in `.ech` scripts as long as the final line ends with the normal
terminator.
You can also register stable column names for a predicate-backed table in the
frontend before running SQL, including tables that currently have no facts:
```text
schema Parent(parent, child).
sql SELECT parent FROM Parent WHERE child = 'bob';
```
For multi-table queries, qualify column names with the table name:
```text
schema Parent(parent, child).
schema Ancestor(parent, child).
sql SELECT Parent.parent, Ancestor.child FROM Parent, Ancestor WHERE Parent.child = Ancestor.parent;
```
For self-joins or shorter qualification, use table aliases:
```text
schema Parent(parent, child).
sql SELECT p.parent, q.child FROM Parent AS p, Parent AS q WHERE p.child = q.parent;
```
Current limits:
- default column names are positional such as `c0`, `c1`
- stable names require explicit catalog registration or `schema ...` in the frontend
- single-table queries may also use the table name as a qualifier when no alias is present
- joins currently use comma-separated tables plus `WHERE` filtering
- multi-table queries require qualified column names such as `Parent.child`
- table aliases are supported via `FROM Parent AS p`
- `WHERE` supports `=`, `!=`/`<>`, `AND`, and `OR` (with standard precedence)
- `ORDER BY` supports output-column ordering with `ASC`/`DESC`
- `LIMIT` restricts the number of output rows
- literals include strings, integers, and `NULL`
- aggregates: `COUNT(*)`, `COUNT(col)`, `SUM`, `MIN`, `MAX`, `AVG`, with optional `GROUP BY`
- projection aliases only via `AS`
Runnable SQL examples:
- `examples/scripts/sql_basic.ech`
- `examples/scripts/sql_join.ech`
- `examples/scripts/sql_self_join.ech`
- `examples/scripts/sql_order_by.ech`
- `examples/scripts/sql_filter_ops.ech`
### Development
For non-trivial changes, run:
```bash
cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check
```
Benchmarks live under `benches/` and can be run with:
```bash
cargo bench
```
### Notes
This repository is still centered on a rule-engine core. The new SQL-related
modules are scaffolding for a broader query-engine direction, not a claim of
feature-complete SQL support.
### License
This project is licensed under [BSD-3](LICENSE).