Compare commits

...

4 Commits

Author SHA1 Message Date
Hassan Abedi
1c3b693ded Add the early version of query-ops implmenation 2026-06-03 12:16:26 +02:00
Hassan Abedi
b31aa32747 Add a note file about the findings from CozoDB and LMDB projects 2026-06-03 12:16:26 +02:00
Hassan Abedi
1c34368da6 Improve scaffolding for query-ops crate 2026-06-03 12:16:26 +02:00
Hassan Abedi
765689b66e Add scaffolding for query-ops crate 2026-06-03 12:16:22 +02:00
17 changed files with 1381 additions and 1 deletions

4
Cargo.lock generated
View File

@ -282,6 +282,10 @@ dependencies = [
"unicode-ident", "unicode-ident",
] ]
[[package]]
name = "query-ops"
version = "0.1.0"
[[package]] [[package]]
name = "quote" name = "quote"
version = "1.0.45" version = "1.0.45"

View File

@ -4,7 +4,7 @@ This demo shows how to store and read data from Geomerge.
The demo: The demo:
1. loads the compiled [`paths.json`](../../external/geomerge/crates/geomerge/tests/data/paths.json) schema, 1. loads the compiled [`paths.json`](https://git.sgai.uk/vincent_liu/geomerge/-/raw/main/crates/geomerge/tests/data/paths.json) schema,
2. creates a Geomerge store, 2. creates a Geomerge store,
3. inserts a small graph dataset in one transaction, 3. inserts a small graph dataset in one transaction,
4. reads the inserted edge back, 4. reads the inserted edge back,

View File

@ -0,0 +1,11 @@
[package]
name = "query-ops"
version = "0.1.0"
edition.workspace = true
license.workspace = true
rust-version.workspace = true
[lints]
workspace = true
[dependencies]

119
crates/query-ops/README.md Normal file
View File

@ -0,0 +1,119 @@
## Query Ops
This crate provides a small set of query operators that can be used to implement a simple query-plan executor.
The operators are: atom scan, semijoin, and natural join.
### Public API
| Item | Type | Description |
|--------------------------------------------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `scan_atom(&Table, &AtomPattern) -> Relation` | function | Scans the table under the pattern and returns a binding relation with one column per distinct variable in first-occurrence order. Literal positions and repeated variables filter rows during the scan. |
| `semijoin(&Relation, &Relation) -> Relation` | function | Returns the rows of `left` whose values on the columns shared with `right` also appear in `right`. The output column list is the same as `left.columns`. |
| `natural_join(&Relation, &Relation) -> Relation` | function | Returns every pair of `left` and `right` rows that agree on shared columns. Each output row holds the columns of `left` followed by the non-shared columns of `right`. |
| `Table` | struct | Holds positional input rows of fixed arity and carries no column names. Construct it with `Table::new(arity)` or `Table::from_rows(arity, rows)`. |
| `AtomPattern` | struct | Specifies, for each table column, either a variable to bind or a literal value to match. The pattern is a `Vec<Term>` whose length must equal the table's arity. |
| `Term` | enum | Represents one position of an `AtomPattern`. A term is either `Var(String)` to bind the cell to a named variable, or `Lit(Value)` to require the cell to equal a given value. |
| `Relation` | struct | Holds rows over named columns and is the type produced by every operator. Construct it with `Relation::new(columns)` or `Relation::from_rows(columns, rows)`. Column names within a single relation must be unique. |
| `Value` | enum | Represents a single cell value stored in a `Table` or `Relation`. A value is either `Int(i64)` or `Str(String)`. |
<div align="center">
<picture>
<img alt="Types" src="docs/diagrams/types.svg" height="50%" width="50%">
</picture>
</div>
### Example
The rule below returns the authors of every bestseller along with the book's price.
It uses all three operators:
- `scan_atom` for the three input tables,
- `semijoin` to keep only authors of bestsellers,
- and `natural_join` to attach each book's price.
```text
Q(name, book, dollars) :- author(name, book), bestseller(book), price(book, dollars).
```
```rust
use query_ops::atom::{AtomPattern, Term, scan_atom};
use query_ops::join::{natural_join, semijoin};
use query_ops::table::Table;
use query_ops::value::Value;
fn s(x: &str) -> Value {
Value::Str(x.to_string())
}
fn i(x: i64) -> Value {
Value::Int(x)
}
fn main() {
let author = Table::from_rows(
2,
vec![
vec![s("Alice"), s("Foo")],
vec![s("Bob"), s("Bar")],
vec![s("Alice"), s("Baz")],
vec![s("Carol"), s("Qux")],
],
);
let bestseller = Table::from_rows(1, vec![vec![s("Foo")], vec![s("Baz")]]);
let price = Table::from_rows(
2,
vec![
vec![s("Foo"), i(25)],
vec![s("Bar"), i(15)],
vec![s("Baz"), i(30)],
vec![s("Qux"), i(20)],
],
);
let author_rel = scan_atom(
&author,
&AtomPattern {
columns: vec![Term::Var("name".to_string()), Term::Var("book".to_string())],
},
);
let bestseller_rel = scan_atom(
&bestseller,
&AtomPattern {
columns: vec![Term::Var("book".to_string())],
},
);
let price_rel = scan_atom(
&price,
&AtomPattern {
columns: vec![Term::Var("book".to_string()), Term::Var("dollars".to_string())],
},
);
let authors_of_bestsellers = semijoin(&author_rel, &bestseller_rel);
let result = natural_join(&authors_of_bestsellers, &price_rel);
assert_eq!(
result.columns,
vec!["name".to_string(), "book".to_string(), "dollars".to_string()],
);
assert_eq!(
result.rows,
vec![
vec![s("Alice"), s("Foo"), i(25)],
vec![s("Alice"), s("Baz"), i(30)],
],
);
}
```
How it works:
<div align="center">
<picture>
<img alt="Types" src="docs/diagrams/workflow.svg" height="90%" width="90%%">
</picture>
</div>
### Test
```sh
cargo test -p query-ops
```

View File

@ -0,0 +1,14 @@
#!/usr/bin/env bash
# You need to have Graphviz installed to run this script
# On Debian-based OSes, you can install it using: sudo apt-get install graphviz
# Directory containing .dot files. Defaults to the script's own directory so the
# script works regardless of the caller's working directory.
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ASSET_DIR=${1:-"${SCRIPT_DIR}"}
# Make figures from .dot files
for f in "${ASSET_DIR}"/*.dot; do
dot -Tsvg "$f" -o "${f%.dot}.svg"
done

View File

@ -0,0 +1,60 @@
digraph QueryOpsTypes {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = TB
ranksep = 0.7;
nodesep = 0.7;
splines = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
penwidth = 1.2
]
table_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table</b> (struct)</td></tr>
<tr><td align="left" balign="left">arity: usize</td></tr>
<tr><td align="left" balign="left">rows: Vec&lt;Vec&lt;Value&gt;&gt;</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
relation_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Relation</b> (struct)</td></tr>
<tr><td align="left" balign="left">columns: Vec&lt;String&gt;</td></tr>
<tr><td align="left" balign="left">rows: Vec&lt;Vec&lt;Value&gt;&gt;</td></tr>
</table>>, fillcolor = "#ECEFF1", color = "#607D8B"]
atom_pattern_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>AtomPattern</b> (struct)</td></tr>
<tr><td align="left" balign="left">columns: Vec&lt;Term&gt;</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
term_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Term</b> (enum)</td></tr>
<tr><td align="left" balign="left">Var(String)</td></tr>
<tr><td align="left" balign="left">Lit(Value)</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
value_node [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Value</b> (enum)</td></tr>
<tr><td align="left" balign="left">Int(i64)</td></tr>
<tr><td align="left" balign="left">Str(String)</td></tr>
</table>>, fillcolor = "#FFF3E0", color = "#FF9800"]
// composition edges: arrow X -> Y reads "X contains Y"
atom_pattern_node -> term_node [label = "Vec<Term>"]
term_node -> value_node [label = "Lit(Value)"]
table_node -> value_node [label = "Vec<Vec<Value>>"]
relation_node -> value_node [label = "Vec<Vec<Value>>"]
}

View File

@ -0,0 +1,85 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 12.2.1 (0)
-->
<!-- Title: QueryOpsTypes Pages: 1 -->
<svg width="584pt" height="391pt"
viewBox="0.00 0.00 583.50 391.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 387)">
<title>QueryOpsTypes</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-387 579.5,-387 579.5,4 -4,4"/>
<!-- table_node -->
<g id="node1" class="node">
<title>table_node</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M159.75,-253.5C159.75,-253.5 12,-253.5 12,-253.5 6,-253.5 0,-247.5 0,-241.5 0,-241.5 0,-170.5 0,-170.5 0,-164.5 6,-158.5 12,-158.5 12,-158.5 159.75,-158.5 159.75,-158.5 165.75,-158.5 171.75,-164.5 171.75,-170.5 171.75,-170.5 171.75,-241.5 171.75,-241.5 171.75,-247.5 165.75,-253.5 159.75,-253.5"/>
<text text-anchor="start" x="43.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table</text>
<text text-anchor="start" x="78.38" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="12" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">arity: usize</text>
<text text-anchor="start" x="12" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">rows: Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- value_node -->
<g id="node5" class="node">
<title>value_node</title>
<path fill="#fff3e0" stroke="#ff9800" stroke-width="1.5" d="M351.38,-95C351.38,-95 264.38,-95 264.38,-95 258.38,-95 252.38,-89 252.38,-83 252.38,-83 252.38,-12 252.38,-12 252.38,-6 258.38,0 264.38,0 264.38,0 351.38,0 351.38,0 357.38,0 363.38,-6 363.38,-12 363.38,-12 363.38,-83 363.38,-83 363.38,-89 357.38,-95 351.38,-95"/>
<text text-anchor="start" x="264.38" y="-74.7" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Value</text>
<text text-anchor="start" x="300.38" y="-74.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(enum)</text>
<text text-anchor="start" x="264.38" y="-44.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Int(i64)</text>
<text text-anchor="start" x="264.38" y="-15.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Str(String)</text>
</g>
<!-- table_node&#45;&gt;value_node -->
<g id="edge3" class="edge">
<title>table_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M152.48,-158.04C181.05,-137.9 214.32,-114.45 242.73,-94.42"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="244.53,-97.44 250.68,-88.82 240.49,-91.72 244.53,-97.44"/>
<text text-anchor="middle" x="240.66" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- relation_node -->
<g id="node2" class="node">
<title>relation_node</title>
<path fill="#eceff1" stroke="#607d8b" stroke-width="1.5" d="M381.75,-253.5C381.75,-253.5 234,-253.5 234,-253.5 228,-253.5 222,-247.5 222,-241.5 222,-241.5 222,-170.5 222,-170.5 222,-164.5 228,-158.5 234,-158.5 234,-158.5 381.75,-158.5 381.75,-158.5 387.75,-158.5 393.75,-164.5 393.75,-170.5 393.75,-170.5 393.75,-241.5 393.75,-241.5 393.75,-247.5 387.75,-253.5 381.75,-253.5"/>
<text text-anchor="start" x="256.5" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Relation</text>
<text text-anchor="start" x="309.75" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="234" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">columns: Vec&lt;String&gt;</text>
<text text-anchor="start" x="234" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">rows: Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- relation_node&#45;&gt;value_node -->
<g id="edge4" class="edge">
<title>relation_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M307.88,-158.04C307.88,-141.95 307.88,-123.74 307.88,-106.86"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="311.38,-107.24 307.88,-97.24 304.38,-107.24 311.38,-107.24"/>
<text text-anchor="middle" x="345" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Vec&lt;Value&gt;&gt;</text>
</g>
<!-- atom_pattern_node -->
<g id="node3" class="node">
<title>atom_pattern_node</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M563.5,-383C563.5,-383 432.25,-383 432.25,-383 426.25,-383 420.25,-377 420.25,-371 420.25,-371 420.25,-329 420.25,-329 420.25,-323 426.25,-317 432.25,-317 432.25,-317 563.5,-317 563.5,-317 569.5,-317 575.5,-323 575.5,-329 575.5,-329 575.5,-371 575.5,-371 575.5,-377 569.5,-383 563.5,-383"/>
<text text-anchor="start" x="432.25" y="-362.7" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">AtomPattern</text>
<text text-anchor="start" x="514" y="-362.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(struct)</text>
<text text-anchor="start" x="432.25" y="-332.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00">columns: Vec&lt;Term&gt;</text>
</g>
<!-- term_node -->
<g id="node4" class="node">
<title>term_node</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M539.88,-253.5C539.88,-253.5 455.88,-253.5 455.88,-253.5 449.88,-253.5 443.88,-247.5 443.88,-241.5 443.88,-241.5 443.88,-170.5 443.88,-170.5 443.88,-164.5 449.88,-158.5 455.88,-158.5 455.88,-158.5 539.88,-158.5 539.88,-158.5 545.88,-158.5 551.88,-164.5 551.88,-170.5 551.88,-170.5 551.88,-241.5 551.88,-241.5 551.88,-247.5 545.88,-253.5 539.88,-253.5"/>
<text text-anchor="start" x="455.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Term</text>
<text text-anchor="start" x="488.88" y="-233.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00"> &#160;(enum)</text>
<text text-anchor="start" x="455.88" y="-203.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Var(String)</text>
<text text-anchor="start" x="455.88" y="-174.2" font-family="Helvetica,Arial,sans-serif" font-size="14.00">Lit(Value)</text>
</g>
<!-- atom_pattern_node&#45;&gt;term_node -->
<g id="edge1" class="edge">
<title>atom_pattern_node&#45;&gt;term_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M497.88,-316.78C497.88,-301.61 497.88,-283.04 497.88,-265.52"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="501.38,-265.73 497.88,-255.73 494.38,-265.73 501.38,-265.73"/>
<text text-anchor="middle" x="520.75" y="-283.45" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Vec&lt;Term&gt;</text>
</g>
<!-- term_node&#45;&gt;value_node -->
<g id="edge2" class="edge">
<title>term_node&#45;&gt;value_node</title>
<path fill="none" stroke="#333333" stroke-width="1.2" d="M443.43,-160.15C421.36,-141.97 395.69,-120.83 372.66,-101.87"/>
<polygon fill="#333333" stroke="#333333" stroke-width="1.2" points="375,-99.25 365.05,-95.6 370.54,-104.65 375,-99.25"/>
<text text-anchor="middle" x="428.07" y="-124.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">Lit(Value)</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 7.3 KiB

View File

@ -0,0 +1,122 @@
digraph QueryOpsHandPlan {
fontname = "Helvetica,Arial,sans-serif"
layout = dot
rankdir = LR
ranksep = 0.9;
nodesep = 0.7;
splines = true;
compound = true;
bgcolor = "white"
node [
fontname = "Helvetica,Arial,sans-serif",
shape = box,
style = "filled,rounded",
color = "#555555",
fillcolor = "white",
penwidth = 1.5
]
edge [
fontname = "Helvetica,Arial,sans-serif",
color = "#333333",
fontsize = 9,
fontcolor = "#555555",
labeldistance = 2.0,
penwidth = 1.2
]
subgraph cluster_inputs {
label = "Inputs (positional tables)"
style = "dashed"
color = "#888888"
fontcolor = "#555555"
margin = 18
author_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: author</b></td></tr>
<tr><td align="left" balign="left">• arity 2</td></tr>
<tr><td align="left" balign="left">• rows: (name, book)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
bestseller_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: bestseller</b></td></tr>
<tr><td align="left" balign="left">• arity 1</td></tr>
<tr><td align="left" balign="left">• rows: (book)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
price_table [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Table: price</b></td></tr>
<tr><td align="left" balign="left">• arity 2</td></tr>
<tr><td align="left" balign="left">• rows: (book, dollars)</td></tr>
</table>>, fillcolor = "#E8F4FD", color = "#2196F3"]
}
subgraph cluster_atoms {
label = "Atom Scans (scan_atom: Table × AtomPattern → Relation)"
style = "dashed"
color = "#9C27B0"
fontcolor = "#7B1FA2"
margin = 14
author_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>author_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var name, Var book]</td></tr>
<tr><td align="left" balign="left">cols: [name, book]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
bestseller_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>bestseller_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var book]</td></tr>
<tr><td align="left" balign="left">cols: [book]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
price_rel [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>price_rel</b></td></tr>
<tr><td align="left" balign="left">pattern: [Var book, Var dollars]</td></tr>
<tr><td align="left" balign="left">cols: [book, dollars]</td></tr>
</table>>, fillcolor = "#F3E5F5", color = "#9C27B0"]
}
subgraph cluster_joins {
label = "Joins (shared cols = matching column names)"
style = "dashed"
color = "#4CAF50"
fontcolor = "#388E3C"
margin = 14
semijoin_step [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>semijoin</b></td></tr>
<tr><td align="left" balign="left">authors of bestsellers</td></tr>
<tr><td align="left" balign="left">shared: book</td></tr>
<tr><td align="left" balign="left">cols: [name, book]</td></tr>
</table>>, fillcolor = "#E8F5E9", color = "#4CAF50"]
natural_join_step [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>natural_join</b></td></tr>
<tr><td align="left" balign="left">attach each book's price</td></tr>
<tr><td align="left" balign="left">shared: book</td></tr>
<tr><td align="left" balign="left">cols: [name, book, dollars]</td></tr>
</table>>, fillcolor = "#E8F5E9", color = "#4CAF50"]
}
subgraph cluster_output {
label = "Output (binding relation)"
style = "dashed"
color = "#888888"
fontcolor = "#555555"
margin = 18
result [label = <<table border="0" cellborder="0" cellspacing="0" cellpadding="4">
<tr><td align="center"><b>Q result</b></td></tr>
<tr><td align="left" balign="left">authors of bestsellers with each book's price</td></tr>
<tr><td align="left" balign="left">cols: [name, book, dollars]</td></tr>
</table>>, fillcolor = "#ECEFF1", color = "#607D8B"]
}
// Atom scans consume tables
author_table -> author_rel [color = "#2196F3"]
bestseller_table -> bestseller_rel [color = "#2196F3"]
price_table -> price_rel [color = "#2196F3"]
// semijoin narrows author_rel to bestseller authors
author_rel -> semijoin_step [label = "left", color = "#9C27B0"]
bestseller_rel -> semijoin_step [label = "right", color = "#9C27B0"]
// natural_join attaches price
semijoin_step -> natural_join_step [label = "left", color = "#4CAF50"]
price_rel -> natural_join_step [label = "right", color = "#9C27B0"]
// Final output
natural_join_step -> result [color = "#4CAF50"]
}

View File

@ -0,0 +1,159 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 12.2.1 (0)
-->
<!-- Title: QueryOpsHandPlan Pages: 1 -->
<svg width="1482pt" height="471pt"
viewBox="0.00 0.00 1481.75 471.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 467)">
<title>QueryOpsHandPlan</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-467 1477.75,-467 1477.75,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_inputs</title>
<polygon fill="white" stroke="#888888" stroke-dasharray="5,2" points="8,-8 8,-455 198.5,-455 198.5,-8 8,-8"/>
<text text-anchor="middle" x="103.25" y="-437.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#555555">Inputs (positional tables)</text>
</g>
<g id="clust2" class="cluster">
<title>cluster_atoms</title>
<polygon fill="white" stroke="#9c27b0" stroke-dasharray="5,2" points="233.5,-12 233.5,-451 609.5,-451 609.5,-12 233.5,-12"/>
<text text-anchor="middle" x="421.5" y="-433.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#7b1fa2">Atom Scans &#160;(scan_atom: Table × AtomPattern → Relation)</text>
</g>
<g id="clust3" class="cluster">
<title>cluster_joins</title>
<polygon fill="white" stroke="#4caf50" stroke-dasharray="5,2" points="665.5,-141 665.5,-322 1106,-322 1106,-141 665.5,-141"/>
<text text-anchor="middle" x="885.75" y="-304.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#388e3c">Joins &#160;(shared cols = matching column names)</text>
</g>
<g id="clust4" class="cluster">
<title>cluster_output</title>
<polygon fill="white" stroke="#888888" stroke-dasharray="5,2" points="1141,-152 1141,-311 1465.75,-311 1465.75,-152 1141,-152"/>
<text text-anchor="middle" x="1303.38" y="-293.7" font-family="Helvetica,Arial,sans-serif" font-size="14.00" fill="#555555">Output (binding relation)</text>
</g>
<!-- author_table -->
<g id="node1" class="node">
<title>author_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M165.88,-408.12C165.88,-408.12 40.62,-408.12 40.62,-408.12 34.62,-408.12 28.62,-402.12 28.62,-396.12 28.62,-396.12 28.62,-325.88 28.62,-325.88 28.62,-319.88 34.62,-313.88 40.62,-313.88 40.62,-313.88 165.88,-313.88 165.88,-313.88 171.88,-313.88 177.88,-319.88 177.88,-325.88 177.88,-325.88 177.88,-396.12 177.88,-396.12 177.88,-402.12 171.88,-408.12 165.88,-408.12"/>
<text text-anchor="start" x="60.88" y="-387.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: author</text>
<text text-anchor="start" x="40.62" y="-358.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 2</text>
<text text-anchor="start" x="40.62" y="-329.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (name, book)</text>
</g>
<!-- author_rel -->
<g id="node4" class="node">
<title>author_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M509.12,-408.12C509.12,-408.12 332.88,-408.12 332.88,-408.12 326.88,-408.12 320.88,-402.12 320.88,-396.12 320.88,-396.12 320.88,-325.88 320.88,-325.88 320.88,-319.88 326.88,-313.88 332.88,-313.88 332.88,-313.88 509.12,-313.88 509.12,-313.88 515.12,-313.88 521.12,-319.88 521.12,-325.88 521.12,-325.88 521.12,-396.12 521.12,-396.12 521.12,-402.12 515.12,-408.12 509.12,-408.12"/>
<text text-anchor="start" x="388" y="-387.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">author_rel</text>
<text text-anchor="start" x="332.88" y="-358.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var name, Var book]</text>
<text text-anchor="start" x="332.88" y="-329.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book]</text>
</g>
<!-- author_table&#45;&gt;author_rel -->
<g id="edge1" class="edge">
<title>author_table&#45;&gt;author_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M178.28,-361C217.1,-361 265.45,-361 308.68,-361"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="308.62,-364.5 318.62,-361 308.62,-357.5 308.62,-364.5"/>
</g>
<!-- bestseller_table -->
<g id="node2" class="node">
<title>bestseller_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M156.12,-264.12C156.12,-264.12 50.38,-264.12 50.38,-264.12 44.38,-264.12 38.38,-258.12 38.38,-252.12 38.38,-252.12 38.38,-181.88 38.38,-181.88 38.38,-175.88 44.38,-169.88 50.38,-169.88 50.38,-169.88 156.12,-169.88 156.12,-169.88 162.12,-169.88 168.12,-175.88 168.12,-181.88 168.12,-181.88 168.12,-252.12 168.12,-252.12 168.12,-258.12 162.12,-264.12 156.12,-264.12"/>
<text text-anchor="start" x="50.38" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: bestseller</text>
<text text-anchor="start" x="50.38" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 1</text>
<text text-anchor="start" x="50.38" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (book)</text>
</g>
<!-- bestseller_rel -->
<g id="node5" class="node">
<title>bestseller_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M476.12,-264.12C476.12,-264.12 365.88,-264.12 365.88,-264.12 359.88,-264.12 353.88,-258.12 353.88,-252.12 353.88,-252.12 353.88,-181.88 353.88,-181.88 353.88,-175.88 359.88,-169.88 365.88,-169.88 365.88,-169.88 476.12,-169.88 476.12,-169.88 482.12,-169.88 488.12,-175.88 488.12,-181.88 488.12,-181.88 488.12,-252.12 488.12,-252.12 488.12,-258.12 482.12,-264.12 476.12,-264.12"/>
<text text-anchor="start" x="377.5" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">bestseller_rel</text>
<text text-anchor="start" x="365.88" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var book]</text>
<text text-anchor="start" x="365.88" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [book]</text>
</g>
<!-- bestseller_table&#45;&gt;bestseller_rel -->
<g id="edge2" class="edge">
<title>bestseller_table&#45;&gt;bestseller_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M168.53,-217C218.65,-217 288.47,-217 341.83,-217"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="341.82,-220.5 351.82,-217 341.82,-213.5 341.82,-220.5"/>
</g>
<!-- price_table -->
<g id="node3" class="node">
<title>price_table</title>
<path fill="#e8f4fd" stroke="#2196f3" stroke-width="1.5" d="M168.5,-120.12C168.5,-120.12 38,-120.12 38,-120.12 32,-120.12 26,-114.12 26,-108.12 26,-108.12 26,-37.88 26,-37.88 26,-31.88 32,-25.88 38,-25.88 38,-25.88 168.5,-25.88 168.5,-25.88 174.5,-25.88 180.5,-31.88 180.5,-37.88 180.5,-37.88 180.5,-108.12 180.5,-108.12 180.5,-114.12 174.5,-120.12 168.5,-120.12"/>
<text text-anchor="start" x="65.75" y="-99.83" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Table: price</text>
<text text-anchor="start" x="38" y="-70.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• arity 2</text>
<text text-anchor="start" x="38" y="-41.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">• rows: (book, dollars)</text>
</g>
<!-- price_rel -->
<g id="node6" class="node">
<title>price_rel</title>
<path fill="#f3e5f5" stroke="#9c27b0" stroke-width="1.5" d="M511.75,-120.12C511.75,-120.12 330.25,-120.12 330.25,-120.12 324.25,-120.12 318.25,-114.12 318.25,-108.12 318.25,-108.12 318.25,-37.88 318.25,-37.88 318.25,-31.88 324.25,-25.88 330.25,-25.88 330.25,-25.88 511.75,-25.88 511.75,-25.88 517.75,-25.88 523.75,-31.88 523.75,-37.88 523.75,-37.88 523.75,-108.12 523.75,-108.12 523.75,-114.12 517.75,-120.12 511.75,-120.12"/>
<text text-anchor="start" x="392.88" y="-99.83" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">price_rel</text>
<text text-anchor="start" x="330.25" y="-70.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">pattern: [Var book, Var dollars]</text>
<text text-anchor="start" x="330.25" y="-41.58" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [book, dollars]</text>
</g>
<!-- price_table&#45;&gt;price_rel -->
<g id="edge3" class="edge">
<title>price_table&#45;&gt;price_rel</title>
<path fill="none" stroke="#2196f3" stroke-width="1.2" d="M180.68,-73C218.39,-73 264.62,-73 306.37,-73"/>
<polygon fill="#2196f3" stroke="#2196f3" stroke-width="1.2" points="306.2,-76.5 316.2,-73 306.2,-69.5 306.2,-76.5"/>
</g>
<!-- semijoin_step -->
<g id="node7" class="node">
<title>semijoin_step</title>
<path fill="#e8f5e9" stroke="#4caf50" stroke-width="1.5" d="M819.75,-278.62C819.75,-278.62 691.5,-278.62 691.5,-278.62 685.5,-278.62 679.5,-272.62 679.5,-266.62 679.5,-266.62 679.5,-167.38 679.5,-167.38 679.5,-161.38 685.5,-155.38 691.5,-155.38 691.5,-155.38 819.75,-155.38 819.75,-155.38 825.75,-155.38 831.75,-161.38 831.75,-167.38 831.75,-167.38 831.75,-266.62 831.75,-266.62 831.75,-272.62 825.75,-278.62 819.75,-278.62"/>
<text text-anchor="start" x="727.88" y="-258.32" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">semijoin</text>
<text text-anchor="start" x="691.5" y="-229.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">authors of bestsellers</text>
<text text-anchor="start" x="691.5" y="-200.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">shared: book</text>
<text text-anchor="start" x="691.5" y="-171.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book]</text>
</g>
<!-- author_rel&#45;&gt;semijoin_step -->
<g id="edge4" class="edge">
<title>author_rel&#45;&gt;semijoin_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M521.48,-324.79C550.11,-313.83 581.24,-301.4 609.5,-289 628.84,-280.51 649.32,-270.81 668.61,-261.33"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="670.15,-264.48 677.56,-256.91 667.04,-258.2 670.15,-264.48"/>
<text text-anchor="middle" x="637.5" y="-284.9" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">left</text>
</g>
<!-- bestseller_rel&#45;&gt;semijoin_step -->
<g id="edge5" class="edge">
<title>bestseller_rel&#45;&gt;semijoin_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M488.51,-217C539.93,-217 611.54,-217 667.53,-217"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="667.41,-220.5 677.41,-217 667.41,-213.5 667.41,-220.5"/>
<text text-anchor="middle" x="637.5" y="-221.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">right</text>
</g>
<!-- natural_join_step -->
<g id="node8" class="node">
<title>natural_join_step</title>
<path fill="#e8f5e9" stroke="#4caf50" stroke-width="1.5" d="M1080,-278.62C1080,-278.62 922.5,-278.62 922.5,-278.62 916.5,-278.62 910.5,-272.62 910.5,-266.62 910.5,-266.62 910.5,-167.38 910.5,-167.38 910.5,-161.38 916.5,-155.38 922.5,-155.38 922.5,-155.38 1080,-155.38 1080,-155.38 1086,-155.38 1092,-161.38 1092,-167.38 1092,-167.38 1092,-266.62 1092,-266.62 1092,-272.62 1086,-278.62 1080,-278.62"/>
<text text-anchor="start" x="963" y="-258.32" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">natural_join</text>
<text text-anchor="start" x="922.5" y="-229.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">attach each book&#39;s price</text>
<text text-anchor="start" x="922.5" y="-200.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">shared: book</text>
<text text-anchor="start" x="922.5" y="-171.07" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book, dollars]</text>
</g>
<!-- price_rel&#45;&gt;natural_join_step -->
<g id="edge7" class="edge">
<title>price_rel&#45;&gt;natural_join_step</title>
<path fill="none" stroke="#9c27b0" stroke-width="1.2" d="M523.91,-71.78C608.41,-73.58 730.63,-82.79 831.75,-116.5 855.71,-124.49 879.92,-136.28 902.24,-149"/>
<polygon fill="#9c27b0" stroke="#9c27b0" stroke-width="1.2" points="900.38,-151.97 910.78,-153.98 903.91,-145.92 900.38,-151.97"/>
<text text-anchor="middle" x="755.62" y="-121.45" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">right</text>
</g>
<!-- semijoin_step&#45;&gt;natural_join_step -->
<g id="edge6" class="edge">
<title>semijoin_step&#45;&gt;natural_join_step</title>
<path fill="none" stroke="#4caf50" stroke-width="1.2" d="M832.04,-217C853.1,-217 876.34,-217 898.65,-217"/>
<polygon fill="#4caf50" stroke="#4caf50" stroke-width="1.2" points="898.4,-220.5 908.4,-217 898.4,-213.5 898.4,-220.5"/>
<text text-anchor="middle" x="871.12" y="-221.95" font-family="Helvetica,Arial,sans-serif" font-size="9.00" fill="#555555">left</text>
</g>
<!-- result -->
<g id="node9" class="node">
<title>result</title>
<path fill="#eceff1" stroke="#607d8b" stroke-width="1.5" d="M1435.75,-264.12C1435.75,-264.12 1171,-264.12 1171,-264.12 1165,-264.12 1159,-258.12 1159,-252.12 1159,-252.12 1159,-181.88 1159,-181.88 1159,-175.88 1165,-169.88 1171,-169.88 1171,-169.88 1435.75,-169.88 1435.75,-169.88 1441.75,-169.88 1447.75,-175.88 1447.75,-181.88 1447.75,-181.88 1447.75,-252.12 1447.75,-252.12 1447.75,-258.12 1441.75,-264.12 1435.75,-264.12"/>
<text text-anchor="start" x="1277.5" y="-243.82" font-family="Helvetica,Arial,sans-serif" font-weight="bold" font-size="14.00">Q result</text>
<text text-anchor="start" x="1171" y="-214.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">authors of bestsellers with each book&#39;s price</text>
<text text-anchor="start" x="1171" y="-185.57" font-family="Helvetica,Arial,sans-serif" font-size="14.00">cols: [name, book, dollars]</text>
</g>
<!-- natural_join_step&#45;&gt;result -->
<g id="edge8" class="edge">
<title>natural_join_step&#45;&gt;result</title>
<path fill="none" stroke="#4caf50" stroke-width="1.2" d="M1092.3,-217C1109.6,-217 1128.17,-217 1146.86,-217"/>
<polygon fill="#4caf50" stroke="#4caf50" stroke-width="1.2" points="1146.69,-220.5 1156.69,-217 1146.69,-213.5 1146.69,-220.5"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -0,0 +1,190 @@
//! Atom operator: scan a [`Table`] under an [`AtomPattern`] and return a
//! binding [`Relation`].
//!
//! An atom pattern specifies, for each table column, either a variable to bind
//! or a literal that the cell must equal. A variable appearing in more than one
//! column forces those cells to be equal (so `Edge(X, X)` keeps only
//! self-loops). The output relation has one column per distinct variable, in
//! first-occurrence order.
use std::collections::HashMap;
use crate::{relation::Relation, table::Table, value::Value};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Term {
Var(String),
Lit(Value),
}
#[derive(Debug, Clone)]
pub struct AtomPattern {
pub columns: Vec<Term>,
}
/// # Panics
/// Panics if `pattern.columns.len() != table.arity`.
#[must_use]
pub fn scan_atom(table: &Table, pattern: &AtomPattern) -> Relation {
assert_eq!(
pattern.columns.len(),
table.arity,
"pattern arity mismatch: pattern has {}, table has {}",
pattern.columns.len(),
table.arity,
);
let mut output_vars: Vec<String> = Vec::new();
let mut output_positions: Vec<usize> = Vec::new();
let mut equality_pairs: Vec<(usize, usize)> = Vec::new();
let mut literal_checks: Vec<(usize, &Value)> = Vec::new();
let mut first_position: HashMap<&str, usize> = HashMap::new();
for (i, term) in pattern.columns.iter().enumerate() {
match term {
Term::Var(name) => {
if let Some(&j) = first_position.get(name.as_str()) {
equality_pairs.push((j, i));
} else {
first_position.insert(name.as_str(), i);
output_vars.push(name.clone());
output_positions.push(i);
}
}
Term::Lit(value) => literal_checks.push((i, value)),
}
}
let mut output = Relation::new(output_vars);
'rows: for row in &table.rows {
for &(i, lit) in &literal_checks {
if &row[i] != lit {
continue 'rows;
}
}
for &(j, i) in &equality_pairs {
if row[i] != row[j] {
continue 'rows;
}
}
let projected: Vec<Value> = output_positions.iter().map(|&i| row[i].clone()).collect();
output.push(projected);
}
output
}
#[cfg(test)]
mod tests {
use super::*;
fn var(name: &str) -> Term {
Term::Var(name.to_string())
}
fn lit(value: i64) -> Term {
Term::Lit(Value::Int(value))
}
fn int(value: i64) -> Value {
Value::Int(value)
}
#[test]
fn repeated_variable_keeps_only_self_loops() {
let edge = Table::from_rows(
2,
vec![
vec![int(1), int(2)],
vec![int(2), int(2)],
vec![int(3), int(3)],
vec![int(1), int(1)],
],
);
let pattern = AtomPattern {
columns: vec![var("X"), var("X")],
};
let result = scan_atom(&edge, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(2)], vec![int(3)], vec![int(1)]]);
}
#[test]
fn literal_filters_rows_to_match() {
let edge = Table::from_rows(
2,
vec![
vec![int(1), int(2)],
vec![int(2), int(3)],
vec![int(1), int(4)],
],
);
let pattern = AtomPattern {
columns: vec![lit(1), var("Y")],
};
let result = scan_atom(&edge, &pattern);
assert_eq!(result.columns, vec!["Y".to_string()]);
assert_eq!(result.rows, vec![vec![int(2)], vec![int(4)]]);
}
#[test]
fn distinct_variables_project_in_first_occurrence_order() {
let triples = Table::from_rows(
3,
vec![vec![int(1), int(2), int(3)], vec![int(4), int(5), int(6)]],
);
let pattern = AtomPattern {
columns: vec![var("A"), var("B"), var("C")],
};
let result = scan_atom(&triples, &pattern);
assert_eq!(
result.columns,
vec!["A".to_string(), "B".to_string(), "C".to_string()],
);
assert_eq!(
result.rows,
vec![vec![int(1), int(2), int(3)], vec![int(4), int(5), int(6)]],
);
}
#[test]
fn variable_repeated_three_times_requires_all_equal() {
let triples = Table::from_rows(
3,
vec![
vec![int(1), int(1), int(1)],
vec![int(1), int(1), int(2)],
vec![int(2), int(2), int(2)],
vec![int(1), int(2), int(1)],
],
);
let pattern = AtomPattern {
columns: vec![var("X"), var("X"), var("X")],
};
let result = scan_atom(&triples, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(1)], vec![int(2)]]);
}
#[test]
fn literal_filter_repeated_var_and_projection_combine() {
// Pattern: [Lit(1), Var("X"), Lit(2), Var("X")].
// Keep rows where col0 == 1, col2 == 2, and col1 == col3.
// Output is one column [X], bound to col1 (the first occurrence).
let table = Table::from_rows(
4,
vec![
vec![int(1), int(7), int(2), int(7)],
vec![int(1), int(7), int(2), int(8)],
vec![int(0), int(7), int(2), int(7)],
vec![int(1), int(7), int(3), int(7)],
vec![int(1), int(9), int(2), int(9)],
],
);
let pattern = AtomPattern {
columns: vec![lit(1), var("X"), lit(2), var("X")],
};
let result = scan_atom(&table, &pattern);
assert_eq!(result.columns, vec!["X".to_string()]);
assert_eq!(result.rows, vec![vec![int(7)], vec![int(9)]]);
}
}

View File

@ -0,0 +1,220 @@
//! Semijoin and natural join over binding relations.
//!
//! Both operators join on the shared column names of their inputs (the
//! "overlapping variables" in Datalog terms).
//!
//! - [`semijoin`] keeps rows of `left` whose shared-column values appear in
//! `right`. Output columns are `left.columns` unchanged.
//! - [`natural_join`] keeps every pair `(l, r)` that agrees on shared columns,
//! emitting one row with the union of columns. Output column order is
//! `left.columns` followed by `right.columns` minus the shared ones.
use std::collections::{HashMap, HashSet};
use crate::{relation::Relation, value::Value};
fn shared_columns(left: &Relation, right: &Relation) -> Vec<(usize, usize)> {
left.columns
.iter()
.enumerate()
.filter_map(|(li, name)| {
right
.columns
.iter()
.position(|rname| rname == name)
.map(|ri| (li, ri))
})
.collect()
}
fn project<'a>(row: &'a [Value], indices: impl IntoIterator<Item = &'a usize>) -> Vec<Value> {
indices.into_iter().map(|&i| row[i].clone()).collect()
}
#[must_use]
pub fn semijoin(left: &Relation, right: &Relation) -> Relation {
let shared = shared_columns(left, right);
let left_keys: Vec<usize> = shared.iter().map(|&(li, _)| li).collect();
let right_keys: Vec<usize> = shared.iter().map(|&(_, ri)| ri).collect();
let mut right_set: HashSet<Vec<Value>> = HashSet::new();
for row in &right.rows {
right_set.insert(project(row, &right_keys));
}
let mut output = Relation::new(left.columns.clone());
for row in &left.rows {
if right_set.contains(&project(row, &left_keys)) {
output.push(row.clone());
}
}
output
}
#[must_use]
pub fn natural_join(left: &Relation, right: &Relation) -> Relation {
let shared = shared_columns(left, right);
let left_keys: Vec<usize> = shared.iter().map(|&(li, _)| li).collect();
let right_keys: Vec<usize> = shared.iter().map(|&(_, ri)| ri).collect();
let shared_right: HashSet<usize> = right_keys.iter().copied().collect();
let right_only: Vec<usize> = (0..right.columns.len())
.filter(|i| !shared_right.contains(i))
.collect();
let mut output_columns = left.columns.clone();
for &i in &right_only {
output_columns.push(right.columns[i].clone());
}
let mut right_index: HashMap<Vec<Value>, Vec<&Vec<Value>>> = HashMap::new();
for row in &right.rows {
right_index
.entry(project(row, &right_keys))
.or_default()
.push(row);
}
let mut output = Relation::new(output_columns);
for left_row in &left.rows {
let key = project(left_row, &left_keys);
let Some(matches) = right_index.get(&key) else {
continue;
};
for right_row in matches {
let mut joined = left_row.clone();
for &i in &right_only {
joined.push(right_row[i].clone());
}
output.push(joined);
}
}
output
}
#[cfg(test)]
mod tests {
use super::*;
fn col(name: &str) -> String {
name.to_string()
}
fn int(value: i64) -> Value {
Value::Int(value)
}
#[test]
fn semijoin_keeps_left_rows_matched_on_shared_column() {
let left = Relation::from_rows(
vec![col("X"), col("Y")],
vec![
vec![int(1), int(10)],
vec![int(2), int(20)],
vec![int(3), int(30)],
],
);
let right = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(3)]]);
let result = semijoin(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y")]);
assert_eq!(
result.rows,
vec![vec![int(1), int(10)], vec![int(3), int(30)]],
);
}
#[test]
fn semijoin_does_not_duplicate_left_rows_when_right_has_duplicates() {
let left = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(2)]]);
let right = Relation::from_rows(
vec![col("X"), col("Y")],
vec![
vec![int(1), int(100)],
vec![int(1), int(101)],
vec![int(2), int(200)],
],
);
let result = semijoin(&left, &right);
assert_eq!(result.columns, vec![col("X")]);
assert_eq!(result.rows, vec![vec![int(1)], vec![int(2)]]);
}
#[test]
fn natural_join_emits_union_of_columns_on_match() {
let left = Relation::from_rows(
vec![col("X"), col("Y")],
vec![vec![int(1), int(10)], vec![int(2), int(20)]],
);
let right = Relation::from_rows(
vec![col("Y"), col("Z")],
vec![
vec![int(10), int(100)],
vec![int(20), int(200)],
vec![int(20), int(201)],
],
);
let result = natural_join(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y"), col("Z")]);
assert_eq!(
result.rows,
vec![
vec![int(1), int(10), int(100)],
vec![int(2), int(20), int(200)],
vec![int(2), int(20), int(201)],
],
);
}
#[test]
fn natural_join_with_no_shared_columns_is_cartesian_product() {
let left = Relation::from_rows(vec![col("X")], vec![vec![int(1)], vec![int(2)]]);
let right = Relation::from_rows(vec![col("Y")], vec![vec![int(10)], vec![int(20)]]);
let result = natural_join(&left, &right);
assert_eq!(result.columns, vec![col("X"), col("Y")]);
assert_eq!(
result.rows,
vec![
vec![int(1), int(10)],
vec![int(1), int(20)],
vec![int(2), int(10)],
vec![int(2), int(20)],
],
);
}
#[test]
fn semijoin_returns_empty_when_either_side_is_empty() {
let nonempty = Relation::from_rows(vec![col("X")], vec![vec![int(1)]]);
let empty = Relation::from_rows(vec![col("X")], vec![]);
let r1 = semijoin(&empty, &nonempty);
assert_eq!(r1.columns, vec![col("X")]);
assert!(r1.rows.is_empty());
let r2 = semijoin(&nonempty, &empty);
assert_eq!(r2.columns, vec![col("X")]);
assert!(r2.rows.is_empty());
let r3 = semijoin(&empty, &empty);
assert_eq!(r3.columns, vec![col("X")]);
assert!(r3.rows.is_empty());
}
#[test]
fn natural_join_returns_empty_when_either_side_is_empty() {
let nonempty = Relation::from_rows(vec![col("X")], vec![vec![int(1)]]);
let empty = Relation::from_rows(vec![col("X")], vec![]);
let r1 = natural_join(&empty, &nonempty);
assert_eq!(r1.columns, vec![col("X")]);
assert!(r1.rows.is_empty());
let r2 = natural_join(&nonempty, &empty);
assert_eq!(r2.columns, vec![col("X")]);
assert!(r2.rows.is_empty());
let r3 = natural_join(&empty, &empty);
assert_eq!(r3.columns, vec![col("X")]);
assert!(r3.rows.is_empty());
}
}

View File

@ -0,0 +1,23 @@
//! Physical operators for a small query-plan executor.
//!
//! Three operators are in scope:
//!
//! - [`atom::scan_atom`] scans a [`table::Table`] under an
//! [`atom::AtomPattern`], filtering for repeated-variable equality and
//! literal equality, and outputs a binding [`relation::Relation`].
//! - [`join::semijoin`] keeps rows of one relation whose shared-column values
//! appear in another.
//! - [`join::natural_join`] combines rows that agree on shared columns,
//! emitting the union of their columns.
//!
//! Operators compose by function application; a "query plan written by hand"
//! is just an expression like
//! `natural_join(&semijoin(&a, &b), &scan_atom(&t, &p))`.
//!
//! Integration with an external query-plan IR is out of scope.
pub mod atom;
pub mod join;
pub mod relation;
pub mod table;
pub mod value;

View File

@ -0,0 +1,90 @@
//! Binding relations: rows over named (variable) columns.
//!
//! Every operator in this crate (after the initial atom scan) consumes and
//! produces [`Relation`]s. Column names are variable names; a value at column
//! `i` of a row is the value bound to variable `columns[i]` in that solution.
//!
//! Column names within a single relation must be unique. Constructors enforce
//! this invariant; downstream operators rely on it when matching shared columns
//! across two relations.
use std::collections::HashSet;
use crate::value::Value;
#[derive(Debug, Clone)]
pub struct Relation {
pub columns: Vec<String>,
pub rows: Vec<Vec<Value>>,
}
fn assert_unique_columns(columns: &[String]) {
let mut seen: HashSet<&str> = HashSet::with_capacity(columns.len());
for name in columns {
assert!(
seen.insert(name.as_str()),
"duplicate column name in relation: {name}",
);
}
}
impl Relation {
/// # Panics
/// Panics if `columns` contains a duplicate name.
#[must_use]
pub fn new(columns: Vec<String>) -> Self {
assert_unique_columns(&columns);
Self {
columns,
rows: Vec::new(),
}
}
/// # Panics
/// Panics if `columns` contains a duplicate name, or if any row's length
/// differs from `columns.len()`.
#[must_use]
pub fn from_rows(columns: Vec<String>, rows: Vec<Vec<Value>>) -> Self {
assert_unique_columns(&columns);
let arity = columns.len();
for (i, row) in rows.iter().enumerate() {
assert_eq!(
row.len(),
arity,
"row {i} arity mismatch: expected {arity}, got {}",
row.len(),
);
}
Self { columns, rows }
}
/// # Panics
/// Panics if `row.len() != self.columns.len()`.
pub fn push(&mut self, row: Vec<Value>) {
assert_eq!(
row.len(),
self.columns.len(),
"row arity mismatch: expected {}, got {}",
self.columns.len(),
row.len(),
);
self.rows.push(row);
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic(expected = "duplicate column name")]
fn from_rows_rejects_duplicate_column_names() {
let _ = Relation::from_rows(vec!["X".to_string(), "X".to_string()], vec![]);
}
#[test]
#[should_panic(expected = "duplicate column name")]
fn new_rejects_duplicate_column_names() {
let _ = Relation::new(vec!["X".to_string(), "X".to_string()]);
}
}

View File

@ -0,0 +1,50 @@
//! Raw input relations with positional columns.
//!
//! Tables are the input to atom scans. They carry no column names: positions
//! are matched against an [`AtomPattern`](crate::atom::AtomPattern).
use crate::value::Value;
#[derive(Debug, Clone)]
pub struct Table {
pub arity: usize,
pub rows: Vec<Vec<Value>>,
}
impl Table {
#[must_use]
pub fn new(arity: usize) -> Self {
Self {
arity,
rows: Vec::new(),
}
}
/// # Panics
/// Panics if any row's length differs from `arity`.
#[must_use]
pub fn from_rows(arity: usize, rows: Vec<Vec<Value>>) -> Self {
for (i, row) in rows.iter().enumerate() {
assert_eq!(
row.len(),
arity,
"row {i} arity mismatch: expected {arity}, got {}",
row.len(),
);
}
Self { arity, rows }
}
/// # Panics
/// Panics if `row.len() != self.arity`.
pub fn push(&mut self, row: Vec<Value>) {
assert_eq!(
row.len(),
self.arity,
"row arity mismatch: expected {}, got {}",
self.arity,
row.len(),
);
self.rows.push(row);
}
}

View File

@ -0,0 +1,7 @@
//! Cell values shared by tables and binding relations.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Value {
Int(i64),
Str(String),
}

View File

@ -0,0 +1,91 @@
//! Hand-written query plan composed from `scan_atom`, `semijoin`, and `natural_join`.
//!
//! Schema:
//! - `author(name, book)`: who wrote each book
//! - `bestseller(book)`: the set of bestseller titles
//! - `price(book, dollars)`: price of each book
//!
//! Rule:
//! - `Q(name, book, dollars) :- author(name, book), bestseller(book), price(book, dollars).`
//! ("Authors of bestsellers along with each book's price.")
//!
//! The plan first scans each input table, then narrows `author` to authors of
//! bestsellers via a semijoin against `bestseller`, then attaches each book's
//! price via a natural join against `price`.
use query_ops::atom::{AtomPattern, Term, scan_atom};
use query_ops::join::{natural_join, semijoin};
use query_ops::table::Table;
use query_ops::value::Value;
fn s(x: &str) -> Value {
Value::Str(x.to_string())
}
fn i(x: i64) -> Value {
Value::Int(x)
}
#[test]
fn authors_of_bestsellers_with_price() {
let author = Table::from_rows(
2,
vec![
vec![s("Alice"), s("Foo")],
vec![s("Bob"), s("Bar")],
vec![s("Alice"), s("Baz")],
vec![s("Carol"), s("Qux")],
],
);
let bestseller = Table::from_rows(1, vec![vec![s("Foo")], vec![s("Baz")]]);
let price = Table::from_rows(
2,
vec![
vec![s("Foo"), i(25)],
vec![s("Bar"), i(15)],
vec![s("Baz"), i(30)],
vec![s("Qux"), i(20)],
],
);
let author_rel = scan_atom(
&author,
&AtomPattern {
columns: vec![Term::Var("name".to_string()), Term::Var("book".to_string())],
},
);
let bestseller_rel = scan_atom(
&bestseller,
&AtomPattern {
columns: vec![Term::Var("book".to_string())],
},
);
let price_rel = scan_atom(
&price,
&AtomPattern {
columns: vec![
Term::Var("book".to_string()),
Term::Var("dollars".to_string()),
],
},
);
let authors_of_bestsellers = semijoin(&author_rel, &bestseller_rel);
let result = natural_join(&authors_of_bestsellers, &price_rel);
assert_eq!(
result.columns,
vec![
"name".to_string(),
"book".to_string(),
"dollars".to_string()
],
);
assert_eq!(
result.rows,
vec![
vec![s("Alice"), s("Foo"), i(25)],
vec![s("Alice"), s("Baz"), i(30)],
],
);
}

View File

@ -0,0 +1,135 @@
## Cozo and LMDB Findings
Sources inspected: the Cozo source tree at `github.com/cozodb/cozo`, the LMDB source tree at `github.com/LMDB/lmdb`, and the `heed` Rust binding at `github.com/meilisearch/heed`.
File paths in this note are relative to the root of the named project's source tree.
The aim was to understand how a working Datalog engine (Cozo) implements joins and what a low-level key-value substrate (LMDB) provides that makes those joins cheap.
This note summarizes the design lessons and the practical implications for the `query-ops` crate in this playground.
### Summary
Cozo is an embedded Datalog database written in Rust.
It does not have a separate semijoin operator.
Instead, it has one inner-join operator that picks between two strategies based on how each relation is stored: an index-nested-loop strategy that uses ordered range scans over the substrate, and a fallback that materializes one side into a sorted vector and probes it.
Semijoin behavior, when needed, emerges from a separate rewrite step called the magic-sets transformation, which converts semijoin-shaped pruning into regular inner joins against derived relations.
LMDB is a memory-mapped, ordered key-value store with a B+ tree on disk.
It exposes a small set of cursor primitives that support prefix iteration, range iteration, and exact-key lookup.
These primitives are exactly what an index-nested-loop join needs: seek to a key prefix, then iterate forward while the prefix matches.
The combined lesson is that a good join does not require a clever operator.
It requires the relation to be stored with the join columns at the front of the key, so that the substrate's ordered iteration can do the join itself.
### Cozo
#### What It Is
Cozo is a Datalog database with multiple swappable storage backends, including an in-memory store, SQLite, RocksDB, sled, and TiKV.
The execution engine speaks a single narrow storage trait whose surface is essentially `get`, `put`, `range_iter`, and `prefix_iter` over byte keys.
Each backend implements that trait.
The trait definition lives at `cozo-core/src/storage/mod.rs` in the Cozo source tree.
#### Join Behavior
The relational algebra at `cozo-core/src/query/ra.rs` in the Cozo source tree defines a single join operator named `InnerJoin`.
At execution time it chooses between two strategies based on a check called `join_is_prefix`:
- prefix join: for each tuple from the left side, the engine builds a byte prefix from the join columns and calls `prefix_iter` on the right relation.
The substrate yields all matching tuples in key order.
No hash table is built.
This path is taken whenever the right side's join columns are stored as the prefix of its key.
- materialized join: used when the join columns are not a key prefix.
The right side is read fully into a sorted, deduplicated vector, reordered so the join columns come first, then walked with a `starts_with(prefix)` check.
This is the build-and-probe family, but with a sorted vector instead of a hash map.
The choice is made entirely on whether the join columns sit at the front of the stored key.
#### No Semijoin Operator
A search of the Cozo source for `semijoin` or `semi_join` returns nothing.
Semijoin behavior comes from the magic-sets transformation at `cozo-core/src/query/magic.rs` in the Cozo source tree.
This pass rewrites each rule so that body atoms get joined against an auxiliary "magic" relation whose contents encode the binding patterns supplied by the rule's callers.
The net effect is the same as semijoining body atoms against caller-supplied filters, but the implementation is a logical rewrite, not a runtime operator.
#### No Auto-Maintained Secondary Indexes
Cozo does not maintain secondary indexes automatically.
If you want to query a relation by a column order different from how it was declared, you declare a second relation with the columns reordered and keep its contents synchronized at insert time.
A covering index is just another stored relation.
The decision of which column order to store comes from how you expect to query the data, not from the engine.
### LMDB
#### What It Is
LMDB is a single-file, memory-mapped, ordered key-value store.
It uses a B+ tree on disk and exposes reads as zero-copy byte slices that point directly into the mmap.
It supports a single writer at a time and many concurrent readers, and it uses shadow paging for MVCC, which means commits are atomic without a write-ahead log.
#### Cursor Primitives
A cursor in LMDB is a position inside the B+ tree.
The full set of cursor operations is defined by the `MDB_cursor_op` enum in `libraries/liblmdb/lmdb.h` in the LMDB source tree.
The operations relevant to join work are:
- `MDB_SET_RANGE`: position at the first key greater than or equal to a given key.
This is the seek primitive that makes prefix scans possible.
- `MDB_NEXT`: advance one step forward in key order.
Combined with `MDB_SET_RANGE` and a per-step prefix check, this gives you ordered range iteration.
- `MDB_SET` and `MDB_SET_KEY`: exact-key positioning, used for point lookups.
- `MDB_FIRST` and `MDB_LAST`: positional endpoints.
For databases opened with the `MDB_DUPSORT` flag, one key can carry multiple sorted values, and additional operations apply: `MDB_GET_BOTH`, `MDB_NEXT_DUP`, `MDB_FIRST_DUP`.
This is useful when a relation is encoded as "key = join columns, duplicate values = remaining columns": the set of duplicates is itself a secondary index over the join key.
#### Rust Binding
`heed` is the idiomatic Rust binding for LMDB.
It wraps the cursor operations as `RoCursor` and `RwCursor` and returns key and value byte slices tied to the transaction lifetime, so reads remain zero-copy.
Meilisearch uses `heed` in production, so the binding is well exercised.
### LMDB Versus RocksDB
Both LMDB and RocksDB are ordered key-value stores with prefix and range scans, but their internal designs lead to different operational profiles.
LMDB highlights:
- B+ tree on disk, memory mapped
- Single writer at a time, many concurrent readers
- Zero-copy reads from the mmap
- Append-only on-disk format; deletes leave reclaimable free pages
- File size grows up to a configured `mapsize`
- No background compaction
- Manual reclaim with `mdb_copy --compact`
RocksDB highlights:
- Log-structured merge tree
- Multiple concurrent writers
- Background compaction
- Higher write throughput at the cost of write amplification
- Reads may traverse multiple levels with bloom-filter checks
- Engine manages its own disk layout
For a read-heavy prototype with batch inserts, LMDB is the closer fit: predictable read costs, cheap range scans, and zero-copy probes.
RocksDB earns its overhead when sustained write throughput is the bottleneck.
### Practical Implications
The current `query-ops` crate works on in-memory `Vec<Row>` values and will implement semijoin and natural join with a transient hash on one side.
The Cozo design suggests a clear upgrade path once a real substrate is added.
Short term: keep the in-memory operator and build a transient hash on the smaller side.
This is correct, easy to test, and easy to reason about.
Medium term: when relations move into a substrate like LMDB, encode each relation so that the join columns sit at the prefix of the key, or use a `DUPSORT` database where the duplicate values carry the remaining columns.
At that point the join operator becomes a cursor pattern (`MDB_SET_RANGE` followed by `MDB_NEXT` while the prefix matches), and the separate hash-building step disappears.
Index discipline: if a relation needs to be joined two different ways, store it twice with different prefix orders.
There is no clever-indexing shortcut in either Cozo or LMDB, and trying to invent one is unlikely to be worth the cost.
The takeaway is that the operator surface in `query-ops` is fine for an in-memory prototype, but the substrate decision is the load-bearing one for performance.
We do not need to design around it now, but the natural successor to the current operators is a key-encoding discipline rather than a more elaborate operator implementation.
### Changelog
- **June 2, 2026** -- The first version of this document was made.