query-engine/README.md

235 lines
7.5 KiB
Markdown
Raw Normal View History

2026-04-09 10:12:59 +02:00
## Query Engine
An experimental Rust project for building query-engine components.
Right now the repository is centered on a chase-based reasoning core, a small
interactive frontend, and an early relational/SQL scaffold. The broader target
shape is a query engine with clearer front-end, planning, optimization, and
execution boundaries.
2026-04-09 10:12:59 +02:00
### Current scope
- Chase-based rule evaluation over facts, rules, and substitutions
- Restricted, standard, oblivious, and Skolem chase variants
- Optional semi-naive evaluation across all chase variants
2026-04-09 10:12:59 +02:00
- Provenance-oriented explanations for derived answers
- Script, REPL, local web UI, and optional TUI for experimentation (all with syntax highlighting)
2026-04-09 12:46:26 +02:00
- Relational schema, catalog, logical-plan, and execution scaffolding
- Physical operator scaffolding with a small rule-based rewrite layer
- A minimal SQL slice for `SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT` queries over predicate-backed tables, including `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregates
- Filter push-down across joins in the physical rewrite pass
2026-04-09 10:12:59 +02:00
### Architecture
The repository is currently organized around a few clear subsystems:
- `src/chase/`: rule-engine data structures, chase execution, and stratification
- `src/io/`: CSV-based fact import/export
- `src/frontend/`: REPL, script, GUI, and explanation rendering
- `src/relational/`: schemas, values, rows, and result sets
- `src/catalog/`: predicate-backed table metadata
- `src/sql/`: minimal SQL AST and parser
- `src/planner/`: logical plan structures and SQL-to-plan translation
- `src/execution/`: execution for the current logical-plan subset, the `DataSource` trait, the `TableStore`, and a physical operator layer with rule-based rewrites
Today, the chase subsystem is still the most mature part of the codebase. The
relational and SQL modules are present to create clean extension points for a
broader query-engine architecture.
The executor operates on the `DataSource` trait rather than on the chase
`Instance` directly. This allows non-chase data sources to plug into the SQL
pipeline. The crate ships two implementations: `Instance` (chase-backed) and
`TableStore` (in-memory rows). Implementing `DataSource` for a new backend
requires a single method:
```rust
fn scan(&self, table: &str, schema: &Schema) -> Result<ResultSet, ExecutionError>;
```
2026-04-09 12:46:26 +02:00
### Intended Direction
2026-04-09 10:12:59 +02:00
2026-04-09 12:46:26 +02:00
The medium-term direction is to evolve this project into a more general
query-engine playground with:
2026-04-09 10:12:59 +02:00
- explicit front-end and parsing layers
- internal planning representations
- clearer separation between logical meaning and execution strategy
- support for multiple query-engine experiments instead of only chase logic
2026-04-09 12:46:26 +02:00
The current code now includes an initial SQL front end, logical plan, and
execution path. It is still intentionally narrow and should not be read as full
SQL support.
2026-04-09 10:12:59 +02:00
### Quickstart
#### Rust API
```rust
use query_engine::{Atom, Instance, Term, chase};
use query_engine::chase::rule::RuleBuilder;
let instance: Instance = vec![
Atom::new("Parent", vec![Term::constant("alice"), Term::constant("bob")]),
Atom::new("Parent", vec![Term::constant("bob"), Term::constant("carol")]),
]
.into_iter()
.collect();
let rule1 = RuleBuilder::new()
.when("Parent", vec![Term::var("X"), Term::var("Y")])
.then("Ancestor", vec![Term::var("X"), Term::var("Y")])
.build();
let rule2 = RuleBuilder::new()
.when("Ancestor", vec![Term::var("X"), Term::var("Y")])
.when("Parent", vec![Term::var("Y"), Term::var("Z")])
.then("Ancestor", vec![Term::var("X"), Term::var("Z")])
.build();
let result = chase(instance, &[rule1, rule2]);
assert!(result.terminated);
assert_eq!(result.instance.facts_for_predicate("Ancestor").len(), 3);
```
#### CLI
```bash
cargo run -- repl
cargo run -- gui
cargo run -- script examples/scripts/ancestor.ech
cargo run -- script examples/scripts/sql_join.ech
cargo run --features tui -- tui
2026-04-09 10:12:59 +02:00
```
#### REPL language
```text
fact Parent(alice, bob).
rule Parent(?X, ?Y) -> Ancestor(?X, ?Y).
schema Parent(parent, child).
sql SELECT * FROM Parent;
2026-04-09 10:12:59 +02:00
run.
query Ancestor(?X, ?Y)?
explain Ancestor(alice, carol)?
show facts
show rules
reset
help
```
2026-04-09 12:46:26 +02:00
#### Current SQL Slice
The repository now has a narrow SQL pipeline with:
- predicate-backed catalog inference
- relational schemas, rows, and values
- SQL parsing for a small subset
- logical planning
- execution for filtering, ordering, limiting, and basic multi-table joins
2026-04-09 12:46:26 +02:00
Currently supported examples:
```sql
SELECT * FROM Parent
SELECT c0 FROM Parent
SELECT c0 FROM Parent WHERE c1 = 'bob'
SELECT c0 FROM Parent WHERE c1 != 'bob'
SELECT c0 FROM Parent WHERE c1 = 'bob' AND c0 = 'alice'
SELECT c0 FROM Parent WHERE c1 = 'bob' OR c1 = 'carol'
2026-04-10 10:10:46 +02:00
SELECT c0 FROM Parent ORDER BY c0 DESC
SELECT c0 FROM Parent ORDER BY c0 ASC LIMIT 1
SELECT c0 AS parent_name, 'seed' AS label, 42 AS answer FROM Parent
SELECT Parent.parent, Ancestor.child
FROM Parent, Ancestor
WHERE Parent.child = Ancestor.parent
2026-04-10 09:56:18 +02:00
SELECT p.parent, q.child
FROM Parent AS p, Parent AS q
WHERE p.child = q.parent
SELECT COUNT(*) FROM Parent
SELECT dept, COUNT(*), SUM(salary) FROM Emp GROUP BY dept
```
In the REPL or script runner, use the `sql` command and end the statement with
`;`:
```text
sql SELECT c0 FROM Parent WHERE c1 = 'bob';
2026-04-09 12:46:26 +02:00
```
`fact`, `rule`, `schema`, `sql`, `query`, and `explain` commands may also span
multiple lines in `.ech` scripts as long as the final line ends with the normal
terminator.
You can also register stable column names for a predicate-backed table in the
frontend before running SQL, including tables that currently have no facts:
```text
schema Parent(parent, child).
sql SELECT parent FROM Parent WHERE child = 'bob';
```
For multi-table queries, qualify column names with the table name:
```text
schema Parent(parent, child).
schema Ancestor(parent, child).
sql SELECT Parent.parent, Ancestor.child FROM Parent, Ancestor WHERE Parent.child = Ancestor.parent;
```
2026-04-10 09:56:18 +02:00
For self-joins or shorter qualification, use table aliases:
```text
schema Parent(parent, child).
sql SELECT p.parent, q.child FROM Parent AS p, Parent AS q WHERE p.child = q.parent;
```
2026-04-09 12:46:26 +02:00
Current limits:
- default column names are positional such as `c0`, `c1`
- stable names require explicit catalog registration or `schema ...` in the frontend
- single-table queries may also use the table name as a qualifier when no alias is present
- joins currently use comma-separated tables plus `WHERE` filtering
- multi-table queries require qualified column names such as `Parent.child`
2026-04-10 09:56:18 +02:00
- table aliases are supported via `FROM Parent AS p`
- `WHERE` supports `=`, `!=`/`<>`, `AND`, and `OR` (with standard precedence)
2026-04-10 10:10:46 +02:00
- `ORDER BY` supports output-column ordering with `ASC`/`DESC`
- `LIMIT` restricts the number of output rows
- literals include strings, integers, and `NULL`
- aggregates: `COUNT(*)`, `COUNT(col)`, `SUM`, `MIN`, `MAX`, `AVG`, with optional `GROUP BY`
- projection aliases only via `AS`
2026-04-09 12:46:26 +02:00
Runnable SQL examples:
- `examples/scripts/sql_basic.ech`
- `examples/scripts/sql_join.ech`
- `examples/scripts/sql_self_join.ech`
- `examples/scripts/sql_order_by.ech`
- `examples/scripts/sql_filter_ops.ech`
2026-04-09 10:12:59 +02:00
### Development
For non-trivial changes, run:
```bash
cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check
```
Benchmarks live under `benches/` and can be run with:
```bash
cargo bench
```
2026-04-09 10:12:59 +02:00
### Notes
2026-04-09 12:46:26 +02:00
This repository is still centered on a rule-engine core. The new SQL-related
modules are scaffolding for a broader query-engine direction, not a claim of
feature-complete SQL support.
2026-04-09 10:12:59 +02:00
### License
This project is licensed under [BSD-3](LICENSE).