Add the early version of query-ops implmenation

This commit is contained in:
Hassan Abedi 2026-06-03 11:48:33 +02:00
parent b31aa32747
commit cb07245bd9
11 changed files with 1105 additions and 14 deletions

135
crates/query-ops/README.md Normal file
View File

@ -0,0 +1,135 @@
## Query Ops
This crate provides a small set of query operators that can be used to implement a simple query-plan executor.
The operators are: **atom scan**, **semijoin**, and **natural join**.
### Public API
| Item | Kind | Description |
|--------------------------------------------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `scan_atom(&Table, &AtomPattern) -> Relation` | function | Scans the table under the pattern and returns a binding relation with one column per distinct variable in first-occurrence order. Literal positions and repeated variables filter rows during the scan. |
| `semijoin(&Relation, &Relation) -> Relation` | function | Returns the rows of `left` whose values on the columns shared with `right` also appear in `right`. The output column list is the same as `left.columns`. |
| `natural_join(&Relation, &Relation) -> Relation` | function | Returns every pair of `left` and `right` rows that agree on shared columns. Each output row holds the columns of `left` followed by the non-shared columns of `right`. |
| `Table` | struct | Holds positional input rows of fixed arity and carries no column names. Construct it with `Table::new(arity)` or `Table::from_rows(arity, rows)`. |
| `AtomPattern` | struct | Specifies, for each table column, either a variable to bind or a literal value to match. The pattern is a `Vec<Term>` whose length must equal the table's arity. |
| `Term` | enum | Represents one position of an `AtomPattern`. A term is either `Var(String)` to bind the cell to a named variable, or `Lit(Value)` to require the cell to equal a given value. |
| `Relation` | struct | Holds rows over named columns and is the type produced by every operator. Construct it with `Relation::new(columns)` or `Relation::from_rows(columns, rows)`. Column names within a single relation must be unique. |
| `Value` | enum | Represents a single cell value stored in a `Table` or `Relation`. A value is either `Int(i64)` or `Str(String)`. |
Data types and their relationships:
<div align="center">
<picture>
<img alt="Types" src="docs/diagrams/types.svg" height="60%" width="60%">
</picture>
</div>
### Example
The rule below returns the authors of every bestseller along with the book's price.
It uses all three operators:
- `scan_atom` for the three input tables,
- `semijoin` to keep only authors of bestsellers,
- and `natural_join` to attach each book's price.
```prolog
Q(name, book, dollars) :- author(name, book), bestseller(book), price(book, dollars).
```
```rust
use query_ops::atom::{AtomPattern, Term, scan_atom};
use query_ops::join::{natural_join, semijoin};
use query_ops::table::Table;
use query_ops::value::Value;
fn s(x: &str) -> Value {
Value::Str(x.to_string())
}
fn i(x: i64) -> Value {
Value::Int(x)
}
fn main() {
let author = Table::from_rows(
2,
vec![
vec![s("Alice"), s("Foo")],
vec![s("Bob"), s("Bar")],
vec![s("Alice"), s("Baz")],
vec![s("Carol"), s("Qux")],
],
);
let bestseller = Table::from_rows(1, vec![vec![s("Foo")], vec![s("Baz")]]);
let price = Table::from_rows(
2,
vec![
vec![s("Foo"), i(25)],
vec![s("Bar"), i(15)],
vec![s("Baz"), i(30)],
vec![s("Qux"), i(20)],
],
);
let author_rel = scan_atom(
&author,
&AtomPattern {
columns: vec![Term::Var("name".to_string()), Term::Var("book".to_string())],
},
);
let bestseller_rel = scan_atom(
&bestseller,
&AtomPattern {
columns: vec![Term::Var("book".to_string())],
},
);
let price_rel = scan_atom(
&price,
&AtomPattern {
columns: vec![Term::Var("book".to_string()), Term::Var("dollars".to_string())],
},
);
let authors_of_bestsellers = semijoin(&author_rel, &bestseller_rel);
let result = natural_join(&authors_of_bestsellers, &price_rel);
assert_eq!(
result.columns,
vec!["name".to_string(), "book".to_string(), "dollars".to_string()],
);
assert_eq!(
result.rows,
vec![
vec![s("Alice"), s("Foo"), i(25)],
vec![s("Alice"), s("Baz"), i(30)],
],
);
}
```
How it works:
<div align="center">
<picture>
<img alt="Types" src="docs/diagrams/workflow.svg" height="90%" width="90%%">
</picture>
</div>
### Run the Tests
```sh
cargo test -p query-ops
```
### Notes
- **Tables versus relations:** A `Table` is positional (fixed arity with no column names), while a `Relation` is keyed by variable names. The atom
scan is the bridge that turns one into the other (look at the example), and every join after that operates on relations.
- **Joining is by column name:** `semijoin` and `natural_join` find shared columns by matching the strings in `Relation.columns`. Whether two
relations join on a column therefore depends on the variable name you chose in each `AtomPattern`. Picking the same `Term::Var(name)` in two
patterns is what makes them join on that column.
- **No projection operator yet:** `natural_join` always carries forward every column from both inputs, and `scan_atom` keeps every distinct variable
that appears in the pattern. There is no way to drop columns from a relation today, so a result may include more columns than the Datalog rule head
implies.
- **Bulk, not streaming:** Each operator materializes its full output as a new `Relation` and returns it. Operators compose by passing the result of
one as input to the next: `natural_join(&semijoin(&a, &b), &scan_atom(&t, &p))`.

View File

@ -0,0 +1,14 @@
#!/usr/bin/env bash
# You need to have Graphviz installed to run this script
# On Debian-based OSes, you can install it using: sudo apt-get install graphviz
# Directory containing .dot files. Defaults to the script's own directory so the
# script works regardless of the caller's working directory.
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ASSET_DIR=${1:-"${SCRIPT_DIR}"}
# Make figures from .dot files
for f in "${ASSET_DIR}"/*.dot; do
dot -Tsvg "$f" -o "${f%.dot}.svg"
done

View File

@ -0,0 +1,60 @@
digraph QueryOpsTypes {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = TB
ranksep = 0.7;
nodesep = 0.7;
splines = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
penwidth = 1.2
]
table_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table</b> (struct)</td></tr>
<tr><td align="left" balign="left">arity: usize</td></tr>
<tr><td align="left" balign="left">rows: Vec&lt;Vec&lt;Value&gt;&gt;</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
relation_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Relation</b> (struct)</td></tr>
<tr><td align="left" balign="left">columns: Vec&lt;String&gt;</td></tr>
<tr><td align="left" balign="left">rows: Vec&lt;Vec&lt;Value&gt;&gt;</td></tr>
</table>>, fillcolor = "#ECEFF1", color = "#607D8B"]
atom_pattern_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>AtomPattern</b> (struct)</td></tr>
<tr><td align="left" balign="left">columns: Vec&lt;Term&gt;</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
term_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Term</b> (enum)</td></tr>
<tr><td align="left" balign="left">Var(String)</td></tr>
<tr><td align="left" balign="left">Lit(Value)</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
value_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Value</b> (enum)</td></tr>
<tr><td align="left" balign="left">Int(i64)</td></tr>
<tr><td align="left" balign="left">Str(String)</td></tr>
</table>>, fillcolor = "#FFF3E0", color = "#FF9800"]
// composition edges: arrow X -> Y reads "X contains Y"
atom_pattern_node -> term_node [label = "Vec<Term>"]
term_node -> value_node [label = "Lit(Value)"]
table_node -> value_node [label = "Vec<Vec<Value>>"]
relation_node -> value_node [label = "Vec<Vec<Value>>"]
}

View File

@ -0,0 +1,85 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 12.2.1 (0)
-->
<!-- Title: QueryOpsTypes Pages: 1 -->
<svg width="584pt" height="391pt"
viewBox="0.00 0.00 583.50 391.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 387)">
<title>QueryOpsTypes</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-387 579.5,-387 579.5,4 -4,4"/>
<!-- table_node -->
<g id="node1" class="node">
<title>table_node</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M159.75,-253.5C159.75,-253.5 12,-253.5 12,-253.5 6,-253.5 0,-247.5 0,-241.5 0,-241.5 0,-170.5 0,-170.5 0,-164.5 6,-158.5 12,-158.5 12,-158.5 159.75,-158.5 159.75,-158.5 165.75,-158.5 171.75,-164.5 171.75,-170.5 171.75,-170.5 171.75,-241.5 171.75,-241.5 171.75,-247.5 165.75,-253.5 159.75,-253.5"/>
<text text-anchor="start" x="43.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table</text>
<text text-anchor="start" x="78.38" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="12" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">arity: usize</text>
<text text-anchor="start" x="12" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">rows: Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- value_node -->
<g id="node5" class="node">
<title>value_node</title>
<path fill="#fff3e0" stroke="#ff9800" stroke-width="1.5" d="M351.38,-95C351.38,-95 264.38,-95 264.38,-95 258.38,-95 252.38,-89 252.38,-83 252.38,-83 252.38,-12 252.38,-12 252.38,-6 258.38,0 264.38,0 264.38,0 351.38,0 351.38,0 357.38,0 363.38,-6 363.38,-12 363.38,-12 363.38,-83 363.38,-83 363.38,-89 357.38,-95 351.38,-95"/>
<text text-anchor="start" x="264.38" y="-74.7" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Value</text>
<text text-anchor="start" x="300.38" y="-74.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(enum)</text>
<text text-anchor="start" x="264.38" y="-44.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Int(i64)</text>
<text text-anchor="start" x="264.38" y="-15.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Str(String)</text>
</g>
<!-- table_node&#45;&gt;value_node -->
<g id="edge3" class="edge">
<title>table_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M152.48,-158.04C181.05,-137.9 214.32,-114.45 242.73,-94.42"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="244.53,-97.44 250.68,-88.82 240.49,-91.72 244.53,-97.44"/>
<text text-anchor="middle" x="240.66" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- relation_node -->
<g id="node2" class="node">
<title>relation_node</title>
<path fill="#eceff1" stroke="#607d8b" stroke-width="1.5" d="M381.75,-253.5C381.75,-253.5 234,-253.5 234,-253.5 228,-253.5 222,-247.5 222,-241.5 222,-241.5 222,-170.5 222,-170.5 222,-164.5 228,-158.5 234,-158.5 234,-158.5 381.75,-158.5 381.75,-158.5 387.75,-158.5 393.75,-164.5 393.75,-170.5 393.75,-170.5 393.75,-241.5 393.75,-241.5 393.75,-247.5 387.75,-253.5 381.75,-253.5"/>
<text text-anchor="start" x="256.5" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Relation</text>
<text text-anchor="start" x="309.75" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="234" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">columns: Vec&lt;String&gt;</text>
<text text-anchor="start" x="234" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">rows: Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- relation_node&#45;&gt;value_node -->
<g id="edge4" class="edge">
<title>relation_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M307.88,-158.04C307.88,-141.95 307.88,-123.74 307.88,-106.86"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="311.38,-107.24 307.88,-97.24 304.38,-107.24 311.38,-107.24"/>
<text text-anchor="middle" x="345" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- atom_pattern_node -->
<g id="node3" class="node">
<title>atom_pattern_node</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M563.5,-383C563.5,-383 432.25,-383 432.25,-383 426.25,-383 420.25,-377 420.25,-371 420.25,-371 420.25,-329 420.25,-329 420.25,-323 426.25,-317 432.25,-317 432.25,-317 563.5,-317 563.5,-317 569.5,-317 575.5,-323 575.5,-329 575.5,-329 575.5,-371 575.5,-371 575.5,-377 569.5,-383 563.5,-383"/>
<text text-anchor="start" x="432.25" y="-362.7" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">AtomPattern</text>
<text text-anchor="start" x="514" y="-362.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="432.25" y="-332.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">columns: Vec&lt;Term&gt;</text>
</g>
<!-- term_node -->
<g id="node4" class="node">
<title>term_node</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M539.88,-253.5C539.88,-253.5 455.88,-253.5 455.88,-253.5 449.88,-253.5 443.88,-247.5 443.88,-241.5 443.88,-241.5 443.88,-170.5 443.88,-170.5 443.88,-164.5 449.88,-158.5 455.88,-158.5 455.88,-158.5 539.88,-158.5 539.88,-158.5 545.88,-158.5 551.88,-164.5 551.88,-170.5 551.88,-170.5 551.88,-241.5 551.88,-241.5 551.88,-247.5 545.88,-253.5 539.88,-253.5"/>
<text text-anchor="start" x="455.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Term</text>
<text text-anchor="start" x="488.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(enum)</text>
<text text-anchor="start" x="455.88" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Var(String)</text>
<text text-anchor="start" x="455.88" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Lit(Value)</text>
</g>
<!-- atom_pattern_node&#45;&gt;term_node -->
<g id="edge1" class="edge">
<title>atom_pattern_node&#45;&gt;term_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M497.88,-316.78C497.88,-301.61 497.88,-283.04 497.88,-265.52"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="501.38,-265.73 497.88,-255.73 494.38,-265.73 501.38,-265.73"/>
<text text-anchor="middle" x="520.75" y="-283.45" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Term&gt;</text>
</g>
<!-- term_node&#45;&gt;value_node -->
<g id="edge2" class="edge">
<title>term_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M443.43,-160.15C421.36,-141.97 395.69,-120.83 372.66,-101.87"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="375,-99.25 365.05,-95.6 370.54,-104.65 375,-99.25"/>
<text text-anchor="middle" x="428.07" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Lit(Value)</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 7.3 KiB

View File

@ -0,0 +1,122 @@
digraph QueryOpsHandPlan {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = LR
ranksep = 0.9;
nodesep = 0.7;
splines = true;
compound = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
labeldistance = 2.0,
penwidth = 1.2
]
subgraph cluster_inputs {
label = "Inputs (positional tables)"
style = "dashed"
color = "#888888"
fontcolor = "#555555"
margin = 18
author_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: author</b></td></tr>
<tr><td align="left" balign="left">• arity 2</td></tr>
<tr><td align="left" balign="left">• rows: (name, book)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
bestseller_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: bestseller</b></td></tr>
<tr><td align="left" balign="left">• arity 1</td></tr>
<tr><td align="left" balign="left">• rows: (book)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
price_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: price</b></td></tr>
<tr><td align="left" balign="left">• arity 2</td></tr>
<tr><td align="left" balign="left">• rows: (book, dollars)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
}
subgraph cluster_atoms {
label = "Atom Scans (scan_atom: Table × AtomPattern → Relation)"
style = "dashed"
color = "#9C27B0"
fontcolor = "#7B1FA2"
margin = 14
author_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>author_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var name, Var book]</td></tr>
<tr><td align="left" balign="left">cols: [name, book]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
bestseller_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>bestseller_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var book]</td></tr>
<tr><td align="left" balign="left">cols: [book]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
price_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>price_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var book, Var dollars]</td></tr>
<tr><td align="left" balign="left">cols: [book, dollars]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
}
subgraph cluster_joins {
label = "Joins (shared cols = matching column names)"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 14
semijoin_step [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>semijoin</b></td></tr>
<tr><td align="left" balign="left">authors of bestsellers</td></tr>
<tr><td align="left" balign="left">shared: book</td></tr>
<tr><td align="left" balign="left">cols: [name, book]</td></tr>
</table>>, fillcolor = "#E8F5E9", color = "#4CAF50"]
natural_join_step [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>natural_join</b></td></tr>
<tr><td align="left" balign="left">attach each book's price</td></tr>
<tr><td align="left" balign="left">shared: book</td></tr>
<tr><td align="left" balign="left">cols: [name, book, dollars]</td></tr>
</table>>, fillcolor = "#E8F5E9", color = "#4CAF50"]
}
subgraph cluster_output {
label = "Output (binding relation)"
style = "dashed"
color = "#888888"
fontcolor = "#555555"
margin = 18
result [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Q result</b></td></tr>
<tr><td align="left" balign="left">authors of bestsellers with each book's price</td></tr>
<tr><td align="left" balign="left">cols: [name, book, dollars]</td></tr>
</table>>, fillcolor = "#ECEFF1", color = "#607D8B"]
}
// Atom scans consume tables
author_table -> author_rel [color = "#2196F3"]
bestseller_table -> bestseller_rel [color = "#2196F3"]
price_table -> price_rel [color = "#2196F3"]
// semijoin narrows author_rel to bestseller authors
author_rel -> semijoin_step [label = "left", color = "#9C27B0"]
bestseller_rel -> semijoin_step [label = "right", color = "#9C27B0"]
// natural_join attaches price
semijoin_step -> natural_join_step [label = "left", color = "#4CAF50"]
price_rel -> natural_join_step [label = "right", color = "#9C27B0"]
// Final output
natural_join_step -> result [color = "#4CAF50"]
}

View File

@ -0,0 +1,159 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 12.2.1 (0)
-->
<!-- Title: QueryOpsHandPlan Pages: 1 -->
<svg width="1482pt" height="471pt"
viewBox="0.00 0.00 1481.75 471.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 467)">
<title>QueryOpsHandPlan</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-467 1477.75,-467 1477.75,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_inputs</title>
<polygon fill="white" stroke="#888888" stroke-dasharray="5,2" points="8,-8 8,-455 198.5,-455 198.5,-8 8,-8"/>
<text text-anchor="middle" x="103.25" y="-437.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#555555">Inputs (positional tables)</text>
</g>
<g id="clust2" class="cluster">
<title>cluster_atoms</title>
<polygon fill="white" stroke="#9c27b0" stroke-dasharray="5,2" points="233.5,-12 233.5,-451 609.5,-451 609.5,-12 233.5,-12"/>
<text text-anchor="middle" x="421.5" y="-433.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#7b1fa2">Atom Scans &#160;(scan_atom: Table × AtomPattern → Relation)</text>
</g>
<g id="clust3" class="cluster">
<title>cluster_joins</title>
<polygon fill="white" stroke="#4caf50" stroke-dasharray="5,2" points="665.5,-141 665.5,-322 1106,-322 1106,-141 665.5,-141"/>
<text text-anchor="middle" x="885.75" y="-304.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#388e3c">Joins &#160;(shared cols = matching column names)</text>
</g>
<g id="clust4" class="cluster">
<title>cluster_output</title>
<polygon fill="white" stroke="#888888" stroke-dasharray="5,2" points="1141,-152 1141,-311 1465.75,-311 1465.75,-152 1141,-152"/>
<text text-anchor="middle" x="1303.38" y="-293.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#555555">Output (binding relation)</text>
</g>
<!-- author_table -->
<g id="node1" class="node">
<title>author_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M165.88,-408.12C165.88,-408.12 40.62,-408.12 40.62,-408.12 34.62,-408.12 28.62,-402.12 28.62,-396.12 28.62,-396.12 28.62,-325.88 28.62,-325.88 28.62,-319.88 34.62,-313.88 40.62,-313.88 40.62,-313.88 165.88,-313.88 165.88,-313.88 171.88,-313.88 177.88,-319.88 177.88,-325.88 177.88,-325.88 177.88,-396.12 177.88,-396.12 177.88,-402.12 171.88,-408.12 165.88,-408.12"/>
<text text-anchor="start" x="60.88" y="-387.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: author</text>
<text text-anchor="start" x="40.62" y="-358.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 2</text>
<text text-anchor="start" x="40.62" y="-329.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (name, book)</text>
</g>
<!-- author_rel -->
<g id="node4" class="node">
<title>author_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M509.12,-408.12C509.12,-408.12 332.88,-408.12 332.88,-408.12 326.88,-408.12 320.88,-402.12 320.88,-396.12 320.88,-396.12 320.88,-325.88 320.88,-325.88 320.88,-319.88 326.88,-313.88 332.88,-313.88 332.88,-313.88 509.12,-313.88 509.12,-313.88 515.12,-313.88 521.12,-319.88 521.12,-325.88 521.12,-325.88 521.12,-396.12 521.12,-396.12 521.12,-402.12 515.12,-408.12 509.12,-408.12"/>
<text text-anchor="start" x="388" y="-387.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">author_rel</text>
<text text-anchor="start" x="332.88" y="-358.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var name, Var book]</text>
<text text-anchor="start" x="332.88" y="-329.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book]</text>
</g>
<!-- author_table&#45;&gt;author_rel -->
<g id="edge1" class="edge">
<title>author_table&#45;&gt;author_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M178.28,-361C217.1,-361 265.45,-361 308.68,-361"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="308.62,-364.5 318.62,-361 308.62,-357.5 308.62,-364.5"/>
</g>
<!-- bestseller_table -->
<g id="node2" class="node">
<title>bestseller_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M156.12,-264.12C156.12,-264.12 50.38,-264.12 50.38,-264.12 44.38,-264.12 38.38,-258.12 38.38,-252.12 38.38,-252.12 38.38,-181.88 38.38,-181.88 38.38,-175.88 44.38,-169.88 50.38,-169.88 50.38,-169.88 156.12,-169.88 156.12,-169.88 162.12,-169.88 168.12,-175.88 168.12,-181.88 168.12,-181.88 168.12,-252.12 168.12,-252.12 168.12,-258.12 162.12,-264.12 156.12,-264.12"/>
<text text-anchor="start" x="50.38" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: bestseller</text>
<text text-anchor="start" x="50.38" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 1</text>
<text text-anchor="start" x="50.38" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (book)</text>
</g>
<!-- bestseller_rel -->
<g id="node5" class="node">
<title>bestseller_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M476.12,-264.12C476.12,-264.12 365.88,-264.12 365.88,-264.12 359.88,-264.12 353.88,-258.12 353.88,-252.12 353.88,-252.12 353.88,-181.88 353.88,-181.88 353.88,-175.88 359.88,-169.88 365.88,-169.88 365.88,-169.88 476.12,-169.88 476.12,-169.88 482.12,-169.88 488.12,-175.88 488.12,-181.88 488.12,-181.88 488.12,-252.12 488.12,-252.12 488.12,-258.12 482.12,-264.12 476.12,-264.12"/>
<text text-anchor="start" x="377.5" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">bestseller_rel</text>
<text text-anchor="start" x="365.88" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var book]</text>
<text text-anchor="start" x="365.88" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [book]</text>
</g>
<!-- bestseller_table&#45;&gt;bestseller_rel -->
<g id="edge2" class="edge">
<title>bestseller_table&#45;&gt;bestseller_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M168.53,-217C218.65,-217 288.47,-217 341.83,-217"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="341.82,-220.5 351.82,-217 341.82,-213.5 341.82,-220.5"/>
</g>
<!-- price_table -->
<g id="node3" class="node">
<title>price_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M168.5,-120.12C168.5,-120.12 38,-120.12 38,-120.12 32,-120.12 26,-114.12 26,-108.12 26,-108.12 26,-37.88 26,-37.88 26,-31.88 32,-25.88 38,-25.88 38,-25.88 168.5,-25.88 168.5,-25.88 174.5,-25.88 180.5,-31.88 180.5,-37.88 180.5,-37.88 180.5,-108.12 180.5,-108.12 180.5,-114.12 174.5,-120.12 168.5,-120.12"/>
<text text-anchor="start" x="65.75" y="-99.83" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: price</text>
<text text-anchor="start" x="38" y="-70.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 2</text>
<text text-anchor="start" x="38" y="-41.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (book, dollars)</text>
</g>
<!-- price_rel -->
<g id="node6" class="node">
<title>price_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M511.75,-120.12C511.75,-120.12 330.25,-120.12 330.25,-120.12 324.25,-120.12 318.25,-114.12 318.25,-108.12 318.25,-108.12 318.25,-37.88 318.25,-37.88 318.25,-31.88 324.25,-25.88 330.25,-25.88 330.25,-25.88 511.75,-25.88 511.75,-25.88 517.75,-25.88 523.75,-31.88 523.75,-37.88 523.75,-37.88 523.75,-108.12 523.75,-108.12 523.75,-114.12 517.75,-120.12 511.75,-120.12"/>
<text text-anchor="start" x="392.88" y="-99.83" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">price_rel</text>
<text text-anchor="start" x="330.25" y="-70.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var book, Var dollars]</text>
<text text-anchor="start" x="330.25" y="-41.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [book, dollars]</text>
</g>
<!-- price_table&#45;&gt;price_rel -->
<g id="edge3" class="edge">
<title>price_table&#45;&gt;price_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M180.68,-73C218.39,-73 264.62,-73 306.37,-73"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="306.2,-76.5 316.2,-73 306.2,-69.5 306.2,-76.5"/>
</g>
<!-- semijoin_step -->
<g id="node7" class="node">
<title>semijoin_step</title>
<path fill="#e8f5e9" stroke="#4caf50" stroke-width="1.5" d="M819.75,-278.62C819.75,-278.62 691.5,-278.62 691.5,-278.62 685.5,-278.62 679.5,-272.62 679.5,-266.62 679.5,-266.62 679.5,-167.38 679.5,-167.38 679.5,-161.38 685.5,-155.38 691.5,-155.38 691.5,-155.38 819.75,-155.38 819.75,-155.38 825.75,-155.38 831.75,-161.38 831.75,-167.38 831.75,-167.38 831.75,-266.62 831.75,-266.62 831.75,-272.62 825.75,-278.62 819.75,-278.62"/>
<text text-anchor="start" x="727.88" y="-258.32" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">semijoin</text>
<text text-anchor="start" x="691.5" y="-229.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">authors of bestsellers</text>
<text text-anchor="start" x="691.5" y="-200.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">shared: book</text>
<text text-anchor="start" x="691.5" y="-171.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book]</text>
</g>
<!-- author_rel&#45;&gt;semijoin_step -->
<g id="edge4" class="edge">
<title>author_rel&#45;&gt;semijoin_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M521.48,-324.79C550.11,-313.83 581.24,-301.4 609.5,-289 628.84,-280.51 649.32,-270.81 668.61,-261.33"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="670.15,-264.48 677.56,-256.91 667.04,-258.2 670.15,-264.48"/>
<text text-anchor="middle" x="637.5" y="-284.9" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">left</text>
</g>
<!-- bestseller_rel&#45;&gt;semijoin_step -->
<g id="edge5" class="edge">
<title>bestseller_rel&#45;&gt;semijoin_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M488.51,-217C539.93,-217 611.54,-217 667.53,-217"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="667.41,-220.5 677.41,-217 667.41,-213.5 667.41,-220.5"/>
<text text-anchor="middle" x="637.5" y="-221.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">right</text>
</g>
<!-- natural_join_step -->
<g id="node8" class="node">
<title>natural_join_step</title>
<path fill="#e8f5e9" stroke="#4caf50" stroke-width="1.5" d="M1080,-278.62C1080,-278.62 922.5,-278.62 922.5,-278.62 916.5,-278.62 910.5,-272.62 910.5,-266.62 910.5,-266.62 910.5,-167.38 910.5,-167.38 910.5,-161.38 916.5,-155.38 922.5,-155.38 922.5,-155.38 1080,-155.38 1080,-155.38 1086,-155.38 1092,-161.38 1092,-167.38 1092,-167.38 1092,-266.62 1092,-266.62 1092,-272.62 1086,-278.62 1080,-278.62"/>
<text text-anchor="start" x="963" y="-258.32" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">natural_join</text>
<text text-anchor="start" x="922.5" y="-229.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">attach each book&#39;s price</text>
<text text-anchor="start" x="922.5" y="-200.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">shared: book</text>
<text text-anchor="start" x="922.5" y="-171.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book, dollars]</text>
</g>
<!-- price_rel&#45;&gt;natural_join_step -->
<g id="edge7" class="edge">
<title>price_rel&#45;&gt;natural_join_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M523.91,-71.78C608.41,-73.58 730.63,-82.79 831.75,-116.5 855.71,-124.49 879.92,-136.28 902.24,-149"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="900.38,-151.97 910.78,-153.98 903.91,-145.92 900.38,-151.97"/>
<text text-anchor="middle" x="755.62" y="-121.45" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">right</text>
</g>
<!-- semijoin_step&#45;&gt;natural_join_step -->
<g id="edge6" class="edge">
<title>semijoin_step&#45;&gt;natural_join_step</title>
<path fill="none" stroke="#4caf50" stroke-width="1.2" d="M832.04,-217C853.1,-217 876.34,-217 898.65,-217"/>
<polygon fill="#4caf50" stroke="#4caf50" stroke-width="1.2" points="898.4,-220.5 908.4,-217 898.4,-213.5 898.4,-220.5"/>
<text text-anchor="middle" x="871.12" y="-221.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">left</text>
</g>
<!-- result -->
<g id="node9" class="node">
<title>result</title>
<path fill="#eceff1" stroke="#607d8b" stroke-width="1.5" d="M1435.75,-264.12C1435.75,-264.12 1171,-264.12 1171,-264.12 1165,-264.12 1159,-258.12 1159,-252.12 1159,-252.12 1159,-181.88 1159,-181.88 1159,-175.88 1165,-169.88 1171,-169.88 1171,-169.88 1435.75,-169.88 1435.75,-169.88 1441.75,-169.88 1447.75,-175.88 1447.75,-181.88 1447.75,-181.88 1447.75,-252.12 1447.75,-252.12 1447.75,-258.12 1441.75,-264.12 1435.75,-264.12"/>
<text text-anchor="start" x="1277.5" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Q result</text>
<text text-anchor="start" x="1171" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">authors of bestsellers with each book&#39;s price</text>
<text text-anchor="start" x="1171" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book, dollars]</text>
</g>
<!-- natural_join_step&#45;&gt;result -->
<g id="edge8" class="edge">
<title>natural_join_step&#45;&gt;result</title>
<path fill="none" stroke="#4caf50" stroke-width="1.2" d="M1092.3,-217C1109.6,-217 1128.17,-217 1146.86,-217"/>
<polygon fill="#4caf50" stroke="#4caf50" stroke-width="1.2" points="1146.69,-220.5 1156.69,-217 1146.69,-213.5 1146.69,-220.5"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -7,6 +7,8 @@
//! self-loops). The output relation has one column per distinct variable, in //! self-loops). The output relation has one column per distinct variable, in
//! first-occurrence order. //! first-occurrence order.
use std::collections::HashMap;
use crate::{relation::Relation, table::Table, value::Value}; use crate::{relation::Relation, table::Table, value::Value};
#[derive(Debug, Clone, PartialEq, Eq)] #[derive(Debug, Clone, PartialEq, Eq)]
@ -20,10 +22,169 @@ pub struct AtomPattern {
pub columns: Vec<Term>, pub columns: Vec<Term>,
} }
/// # Panics
/// Panics if `pattern.columns.len() != table.arity`.
#[must_use] #[must_use]
pub fn scan_atom(_table: &Table, _pattern: &AtomPattern) -> Relation { pub fn scan_atom(table: &Table, pattern: &AtomPattern) -> Relation {
todo!( assert_eq!(
"scan rows, filter by repeated-variable equality and literal equality, \ pattern.columns.len(),
project to one column per distinct variable in first-occurrence order" table.arity,
) "pattern arity mismatch: pattern has {}, table has {}",
pattern.columns.len(),
table.arity,
);
let mut output_vars: Vec<String> = Vec::new();
let mut output_positions: Vec<usize> = Vec::new();
let mut equality_pairs: Vec<(usize, usize)> = Vec::new();
let mut literal_checks: Vec<(usize, &Value)> = Vec::new();
let mut first_position: HashMap<&str, usize> = HashMap::new();
for (i, term) in pattern.columns.iter().enumerate() {
match term {
Term::Var(name) => {
if let Some(&j) = first_position.get(name.as_str()) {
equality_pairs.push((j, i));
} else {
first_position.insert(name.as_str(), i);
output_vars.push(name.clone());
output_positions.push(i);
}
}
Term::Lit(value) => literal_checks.push((i, value)),
}
}
let mut output = Relation::new(output_vars);
'rows: for row in &table.rows {
for &(i, lit) in &literal_checks {
if &row[i] != lit {
continue 'rows;
}
}
for &(j, i) in &equality_pairs {
if row[i] != row[j] {
continue 'rows;
}
}
let projected: Vec<Value> = output_positions.iter().map(|&i| row[i].clone()).collect();
output.push(projected);
}
output
}
#[cfg(test)]
mod tests {
use super::*;
fn var(name: &str) -> Term {
Term::Var(name.to_string())
}
fn lit(value: i64) -> Term {
Term::Lit(Value::Int(value))
}
fn int(value: i64) -> Value {
Value::Int(value)
}
#[test]
fn repeated_variable_keeps_only_self_loops() {
let edge = Table::from_rows(
2,
vec![
vec![int(1), int(2)],
vec![int(2), int(2)],
vec![int(3), int(3)],
vec![int(1), int(1)],
],
);
let pattern = AtomPattern {
columns: vec![var("X"), var("X")],
};
let result = scan_atom(&edge, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(2)], vec![int(3)], vec![int(1)]]);
}
#[test]
fn literal_filters_rows_to_match() {
let edge = Table::from_rows(
2,
vec![
vec![int(1), int(2)],
vec![int(2), int(3)],
vec![int(1), int(4)],
],
);
let pattern = AtomPattern {
columns: vec![lit(1), var("Y")],
};
let result = scan_atom(&edge, &pattern);
assert_eq!(result.columns, vec!["Y".to_string()]);
assert_eq!(result.rows, vec![vec![int(2)], vec![int(4)]]);
}
#[test]
fn distinct_variables_project_in_first_occurrence_order() {
let triples = Table::from_rows(
3,
vec![vec![int(1), int(2), int(3)], vec![int(4), int(5), int(6)]],
);
let pattern = AtomPattern {
columns: vec![var("A"), var("B"), var("C")],
};
let result = scan_atom(&triples, &pattern);
assert_eq!(
result.columns,
vec!["A".to_string(), "B".to_string(), "C".to_string()],
);
assert_eq!(
result.rows,
vec![vec![int(1), int(2), int(3)], vec![int(4), int(5), int(6)]],
);
}
#[test]
fn variable_repeated_three_times_requires_all_equal() {
let triples = Table::from_rows(
3,
vec![
vec![int(1), int(1), int(1)],
vec![int(1), int(1), int(2)],
vec![int(2), int(2), int(2)],
vec![int(1), int(2), int(1)],
],
);
let pattern = AtomPattern {
columns: vec![var("X"), var("X"), var("X")],
};
let result = scan_atom(&triples, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(1)], vec![int(2)]]);
}
#[test]
fn literal_filter_repeated_var_and_projection_combine() {
// Pattern: [Lit(1), Var("X"), Lit(2), Var("X")].
// Keep rows where col0 == 1, col2 == 2, and col1 == col3.
// Output is one column [X], bound to col1 (the first occurrence).
let table = Table::from_rows(
4,
vec![
vec![int(1), int(7), int(2), int(7)],
vec![int(1), int(7), int(2), int(8)],
vec![int(0), int(7), int(2), int(7)],
vec![int(1), int(7), int(3), int(7)],
vec![int(1), int(9), int(2), int(9)],
],
);
let pattern = AtomPattern {
columns: vec![lit(1), var("X"), lit(2), var("X")],
};
let result = scan_atom(&table, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(7)], vec![int(9)]]);
}
} }

View File

@ -9,17 +9,212 @@
//! emitting one row with the union of columns. Output column order is //! emitting one row with the union of columns. Output column order is
//! `left.columns` followed by `right.columns` minus the shared ones. //! `left.columns` followed by `right.columns` minus the shared ones.
use crate::relation::Relation; use std::collections::{HashMap, HashSet};
#[must_use] use crate::{relation::Relation, value::Value};
pub fn semijoin(_left: &Relation, _right: &Relation) -> Relation {
todo!("hash `right` on shared columns, probe with `left`, keep matching left rows") fn shared_columns(left: &Relation, right: &Relation) -> Vec<(usize, usize)> {
left.columns
.iter()
.enumerate()
.filter_map(|(li, name)| {
right
.columns
.iter()
.position(|rname| rname == name)
.map(|ri| (li, ri))
})
.collect()
}
fn project<'a>(row: &'a [Value], indices: impl IntoIterator<Item = &'a usize>) -> Vec<Value> {
indices.into_iter().map(|&i| row[i].clone()).collect()
} }
#[must_use] #[must_use]
pub fn natural_join(_left: &Relation, _right: &Relation) -> Relation { pub fn semijoin(left: &Relation, right: &Relation) -> Relation {
todo!( let shared = shared_columns(left, right);
"hash one side on shared columns, probe with the other, emit \ let left_keys: Vec<usize> = shared.iter().map(|&(li, _)| li).collect();
left ++ (right \\ shared) for every match" let right_keys: Vec<usize> = shared.iter().map(|&(_, ri)| ri).collect();
)
let mut right_set: HashSet<Vec<Value>> = HashSet::new();
for row in &right.rows {
right_set.insert(project(row, &right_keys));
}
let mut output = Relation::new(left.columns.clone());
for row in &left.rows {
if right_set.contains(&project(row, &left_keys)) {
output.push(row.clone());
}
}
output
}
#[must_use]
pub fn natural_join(left: &Relation, right: &Relation) -> Relation {
let shared = shared_columns(left, right);
let left_keys: Vec<usize> = shared.iter().map(|&(li, _)| li).collect();
let right_keys: Vec<usize> = shared.iter().map(|&(_, ri)| ri).collect();
let shared_right: HashSet<usize> = right_keys.iter().copied().collect();
let right_only: Vec<usize> = (0..right.columns.len())
.filter(|i| !shared_right.contains(i))
.collect();
let mut output_columns = left.columns.clone();
for &i in &right_only {
output_columns.push(right.columns[i].clone());
}
let mut right_index: HashMap<Vec<Value>, Vec<&Vec<Value>>> = HashMap::new();
for row in &right.rows {
right_index
.entry(project(row, &right_keys))
.or_default()
.push(row);
}
let mut output = Relation::new(output_columns);
for left_row in &left.rows {
let key = project(left_row, &left_keys);
let Some(matches) = right_index.get(&key) else {
continue;
};
for right_row in matches {
let mut joined = left_row.clone();
for &i in &right_only {
joined.push(right_row[i].clone());
}
output.push(joined);
}
}
output
}
#[cfg(test)]
mod tests {
use super::*;
fn col(name: &str) -> String {
name.to_string()
}
fn int(value: i64) -> Value {
Value::Int(value)
}
#[test]
fn semijoin_keeps_left_rows_matched_on_shared_column() {
let left = Relation::from_rows(
vec![col("X"), col("Y")],
vec![
vec![int(1), int(10)],
vec![int(2), int(20)],
vec![int(3), int(30)],
],
);
let right = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(3)]]);
let result = semijoin(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y")]);
assert_eq!(
result.rows,
vec![vec![int(1), int(10)], vec![int(3), int(30)]],
);
}
#[test]
fn semijoin_does_not_duplicate_left_rows_when_right_has_duplicates() {
let left = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(2)]]);
let right = Relation::from_rows(
vec![col("X"), col("Y")],
vec![
vec![int(1), int(100)],
vec![int(1), int(101)],
vec![int(2), int(200)],
],
);
let result = semijoin(&left, &right);
assert_eq!(result.columns, vec![col("X")]);
assert_eq!(result.rows, vec![vec![int(1)], vec![int(2)]]);
}
#[test]
fn natural_join_emits_union_of_columns_on_match() {
let left = Relation::from_rows(
vec![col("X"), col("Y")],
vec![vec![int(1), int(10)], vec![int(2), int(20)]],
);
let right = Relation::from_rows(
vec![col("Y"), col("Z")],
vec![
vec![int(10), int(100)],
vec![int(20), int(200)],
vec![int(20), int(201)],
],
);
let result = natural_join(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y"), col("Z")]);
assert_eq!(
result.rows,
vec![
vec![int(1), int(10), int(100)],
vec![int(2), int(20), int(200)],
vec![int(2), int(20), int(201)],
],
);
}
#[test]
fn natural_join_with_no_shared_columns_is_cartesian_product() {
let left = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(2)]]);
let right = Relation::from_rows(vec![col("Y")], vec![vec![int(10)], vec![int(20)]]);
let result = natural_join(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y")]);
assert_eq!(
result.rows,
vec![
vec![int(1), int(10)],
vec![int(1), int(20)],
vec![int(2), int(10)],
vec![int(2), int(20)],
],
);
}
#[test]
fn semijoin_returns_empty_when_either_side_is_empty() {
let nonempty = Relation::from_rows(vec![col("X")], vec![vec![int(1)]]);
let empty = Relation::from_rows(vec![col("X")], vec![]);
let r1 = semijoin(&empty, &nonempty);
assert_eq!(r1.columns, vec![col("X")]);
assert!(r1.rows.is_empty());
let r2 = semijoin(&nonempty, &empty);
assert_eq!(r2.columns, vec![col("X")]);
assert!(r2.rows.is_empty());
let r3 = semijoin(&empty, &empty);
assert_eq!(r3.columns, vec![col("X")]);
assert!(r3.rows.is_empty());
}
#[test]
fn natural_join_returns_empty_when_either_side_is_empty() {
let nonempty = Relation::from_rows(vec![col("X")], vec![vec![int(1)]]);
let empty = Relation::from_rows(vec![col("X")], vec![]);
let r1 = natural_join(&empty, &nonempty);
assert_eq!(r1.columns, vec![col("X")]);
assert!(r1.rows.is_empty());
let r2 = natural_join(&nonempty, &empty);
assert_eq!(r2.columns, vec![col("X")]);
assert!(r2.rows.is_empty());
let r3 = natural_join(&empty, &empty);
assert_eq!(r3.columns, vec![col("X")]);
assert!(r3.rows.is_empty());
}
} }

View File

@ -3,6 +3,12 @@
//! Every operator in this crate (after the initial atom scan) consumes and //! Every operator in this crate (after the initial atom scan) consumes and
//! produces [`Relation`]s. Column names are variable names; a value at column //! produces [`Relation`]s. Column names are variable names; a value at column
//! `i` of a row is the value bound to variable `columns[i]` in that solution. //! `i` of a row is the value bound to variable `columns[i]` in that solution.
//!
//! Column names within a single relation must be unique. Constructors enforce
//! this invariant; downstream operators rely on it when matching shared columns
//! across two relations.
use std::collections::HashSet;
use crate::value::Value; use crate::value::Value;
@ -12,15 +18,46 @@ pub struct Relation {
pub rows: Vec<Vec<Value>>, pub rows: Vec<Vec<Value>>,
} }
fn assert_unique_columns(columns: &[String]) {
let mut seen: HashSet<&str> = HashSet::with_capacity(columns.len());
for name in columns {
assert!(
seen.insert(name.as_str()),
"duplicate column name in relation: {name}",
);
}
}
impl Relation { impl Relation {
/// # Panics
/// Panics if `columns` contains a duplicate name.
#[must_use] #[must_use]
pub fn new(columns: Vec<String>) -> Self { pub fn new(columns: Vec<String>) -> Self {
assert_unique_columns(&columns);
Self { Self {
columns, columns,
rows: Vec::new(), rows: Vec::new(),
} }
} }
/// # Panics
/// Panics if `columns` contains a duplicate name, or if any row's length
/// differs from `columns.len()`.
#[must_use]
pub fn from_rows(columns: Vec<String>, rows: Vec<Vec<Value>>) -> Self {
assert_unique_columns(&columns);
let arity = columns.len();
for (i, row) in rows.iter().enumerate() {
assert_eq!(
row.len(),
arity,
"row {i} arity mismatch: expected {arity}, got {}",
row.len(),
);
}
Self { columns, rows }
}
/// # Panics /// # Panics
/// Panics if `row.len() != self.columns.len()`. /// Panics if `row.len() != self.columns.len()`.
pub fn push(&mut self, row: Vec<Value>) { pub fn push(&mut self, row: Vec<Value>) {
@ -34,3 +71,20 @@ impl Relation {
self.rows.push(row); self.rows.push(row);
} }
} }
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic(expected = "duplicate column name")]
fn from_rows_rejects_duplicate_column_names() {
let _ = Relation::from_rows(vec!["X".to_string(), "X".to_string()], vec![]);
}
#[test]
#[should_panic(expected = "duplicate column name")]
fn new_rejects_duplicate_column_names() {
let _ = Relation::new(vec!["X".to_string(), "X".to_string()]);
}
}

View File

@ -20,6 +20,21 @@ impl Table {
} }
} }
/// # Panics
/// Panics if any row's length differs from `arity`.
#[must_use]
pub fn from_rows(arity: usize, rows: Vec<Vec<Value>>) -> Self {
for (i, row) in rows.iter().enumerate() {
assert_eq!(
row.len(),
arity,
"row {i} arity mismatch: expected {arity}, got {}",
row.len(),
);
}
Self { arity, rows }
}
/// # Panics /// # Panics
/// Panics if `row.len() != self.arity`. /// Panics if `row.len() != self.arity`.
pub fn push(&mut self, row: Vec<Value>) { pub fn push(&mut self, row: Vec<Value>) {

View File

@ -0,0 +1,91 @@
//! Hand-written query plan composed from `scan_atom`, `semijoin`, and `natural_join`.
//!
//! Schema:
//! - `author(name, book)`: who wrote each book
//! - `bestseller(book)`: the set of bestseller titles
//! - `price(book, dollars)`: price of each book
//!
//! Rule:
//! - `Q(name, book, dollars) :- author(name, book), bestseller(book), price(book, dollars).`
//! ("Authors of bestsellers along with each book's price.")
//!
//! The plan first scans each input table, then narrows `author` to authors of
//! bestsellers via a semijoin against `bestseller`, then attaches each book's
//! price via a natural join against `price`.
use query_ops::atom::{AtomPattern, Term, scan_atom};
use query_ops::join::{natural_join, semijoin};
use query_ops::table::Table;
use query_ops::value::Value;
fn s(x: &str) -> Value {
Value::Str(x.to_string())
}
fn i(x: i64) -> Value {
Value::Int(x)
}
#[test]
fn authors_of_bestsellers_with_price() {
let author = Table::from_rows(
2,
vec![
vec![s("Alice"), s("Foo")],
vec![s("Bob"), s("Bar")],
vec![s("Alice"), s("Baz")],
vec![s("Carol"), s("Qux")],
],
);
let bestseller = Table::from_rows(1, vec![vec![s("Foo")], vec![s("Baz")]]);
let price = Table::from_rows(
2,
vec![
vec![s("Foo"), i(25)],
vec![s("Bar"), i(15)],
vec![s("Baz"), i(30)],
vec![s("Qux"), i(20)],
],
);
let author_rel = scan_atom(
&author,
&AtomPattern {
columns: vec![Term::Var("name".to_string()), Term::Var("book".to_string())],
},
);
let bestseller_rel = scan_atom(
&bestseller,
&AtomPattern {
columns: vec![Term::Var("book".to_string())],
},
);
let price_rel = scan_atom(
&price,
&AtomPattern {
columns: vec![
Term::Var("book".to_string()),
Term::Var("dollars".to_string()),
],
},
);
let authors_of_bestsellers = semijoin(&author_rel, &bestseller_rel);
let result = natural_join(&authors_of_bestsellers, &price_rel);
assert_eq!(
result.columns,
vec![
"name".to_string(),
"book".to_string(),
"dollars".to_string()
],
);
assert_eq!(
result.rows,
vec![
vec![s("Alice"), s("Foo"), i(25)],
vec![s("Alice"), s("Baz"), i(30)],
],
);
}