useful-notes/hassan/011-geolog-intermediate-representation.md

199 lines
8.0 KiB
Markdown
Raw Normal View History

# Geolog Intermediate Representation
This note summarizes the intermediate representation currently implemented in the Geolog repo on branch `ir-draft1`:
https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads
2026-03-30 10:45:39 +02:00
All file paths below are repo-relative, not local workspace paths. The main sources are `geolog-lang/src/Geolog/IR.hs` and
`geolog-lang/src/Geolog/Lower.hs`.
## Where the IR sits
The current pipeline is:
1. Source text is lexed and parsed.
2. The elaborator builds a typed core representation.
3. `lowerTop` lowers that elaborated theory into `I.FlatTheory`.
4. Tests pretty-print the lowered form as `table ...` and `law ...` declarations.
The lowering entry point is `geolog-lang/src/Geolog/Lower.hs`, via `lowerTop`, which calls `lower`, then `lowerEntry`, then `finishTheory`.
## Glossary
- `IR`: Intermediate representation. A simpler form used between elaboration and later evaluation.
- `lowering`: The step that converts elaborated Geolog code into the IR.
- `Path`: A qualified name like `M.mul`.
- `ColType`: The type of one table column.
- `Table`: A lowered relational object with columns and maybe a primary key.
- `RelTable`: A table used for a relation or predicate.
- `FunTable`: A table used for something function-like, where input columns determine an output.
- `Term`: A small IR expression, such as a variable, literal, or projection.
- `Atom`: One table occurrence with concrete arguments.
- `Prop`: A logical formula built from atoms and connectives like equality and conjunction.
- `Law`: A quantified rule over IR propositions.
- `FlatTheory`: The final lowered result: a set of tables plus laws.
- `foreignKeys`: Generated laws saying a row depends on other rows existing.
- `total`: Generated laws saying a function row must exist when its inputs exist.
- `PiColumn`: A column introduced as part of a table's explicit shape.
- `JoinColumn`: A column introduced while following a reference or join.
## Core IR data types
The IR is intentionally relational. It turns theory structure into tables and rules.
- `Path = [QName]`: a hierarchical name like `["M","mul"]`, pretty-printed as `M.mul`.
2026-03-30 10:45:39 +02:00
- `ColType` is the type of a column. `EntityType Path` means the column refers to rows from another entity/table, `PrimType PrimInt | PrimString`
covers builtin scalars, and `Tuple [(QName, ColType)]` supports structured values.
- `Table` contains `columns :: [ColType]` and `primaryKey :: Maybe [Int]`.
- `Term` supports literals, variables, projections like `x.field`, and record construction.
- `Atom` contains `table :: Path`, `rowId :: Maybe Term`, and `values :: Map Int Term`.
- `Prop` and `Law` provide a small logic language for atoms, equality, conjunction, disjunction, and universally quantified Horn-style laws.
- `FlatTheory` is the final lowered package with `tables :: Map Path Table` and `laws :: Map Path Law`.
See `geolog-lang/src/Geolog/IR.hs` for the concrete definitions.
## What lowering does
Lowering walks the elaborated theory and turns it into tables plus rules.
### 1. Records become path prefixes
Nested theory fields extend the current path root. If a record field `mul` is inside `M`, the lowered path becomes `M.mul`.
This happens in `lowerTheoryFields` in `geolog-lang/src/Geolog/Lower.hs`.
### 2. Relations become `RelTable`
When lowering sees `C.VU C.QueryU`, it creates a relational table with the currently accumulated `PiColumn`s and no primary key:
- `kind = RelTable`
- `columns = piColumnsOf ?columns`
- `total = Nothing`
This is the representation for proposition-like relations.
See `lowerTheoryTy` in `geolog-lang/src/Geolog/Lower.hs`.
### 3. Functions/entities become `FunTable`
When lowering finds something function-like that does not reduce to a simpler theory shape, it emits a `FunTable`.
At the end of lowering, function tables get a primary key consisting of all columns except the final result column. That matches the examples:
- `M.mul = [M.car, M.car, M.car, primary key(0,1)]`
- `D.mul = [..., primary key(0,1,2,3)]`
This logic is in `finishTheory` in `geolog-lang/src/Geolog/Lower.hs`.
### 4. Dependent structure becomes extra columns
Lowering is not just flattening syntax. It also turns shared context into explicit columns.
2026-03-30 10:45:39 +02:00
`liftOnShared` copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like
`D.mul` contain both `M.car` arguments and displayed-car columns.
See `liftOnShared` in `geolog-lang/src/Geolog/Lower.hs`.
### 5. References generate join information
When a local path is used as:
- an attribute, lowering creates a `PiColumn`
- a value, lowering creates a `JoinColumn`
In both cases it also accumulates foreign-key-style atoms in `?fkeys`. Those atoms are later assembled into explicit laws.
See `lowerAttrNeutral` and `lowerArgNeutral` in `geolog-lang/src/Geolog/Lower.hs`.
## Why there are `foreignKeys` and `total` laws
Each lowered table may carry generated laws.
### `foreignKeys`
`makeFkeys` says: if a row exists in table `p`, then the rows it depends on must also exist.
This is how dependent references become relational constraints. A row in `D.mul` implies the existence of corresponding rows in `D.car` and `M.mul`.
See `makeFkeys` in `geolog-lang/src/Geolog/Lower.hs`.
### `total`
`makeTotal` is only generated for function tables. It says: if the prerequisite input rows exist, then a row in the function table must exist.
So `M.mul.total` expresses totality of multiplication, while `D.mul.total` expresses totality relative to the displayed structure.
See `makeTotal` in `geolog-lang/src/Geolog/Lower.hs`.
## Reading the printed IR
The pretty-printer renders `FlatTheory` as a sequence of `table` and `law` declarations. For example, the golden output for `magma` contains:
```text
table M.mul = [M.car, M.car, M.car, primary key(0,1)]
law M.mul.total = forall (a : M.car) (b : M.car). T |- M.mul(a, b)
```
and:
```text
table D.mul = [M.car, M.car, D.car, D.car, D.car, primary key(0,1,2,3)]
```
This means:
- the first columns are inputs or lifted context
- the last column is the produced value/row
- the primary key marks the determining inputs
- `foreignKeys` laws enforce referenced rows
- `total` laws enforce existence of output rows for function tables
Concrete examples live in:
- `geolog-lang/test/magma.output`
- `geolog-lang/test/paths.output`
## Practical takeaway
The implemented IR is best understood as a relationalized, flattened theory:
- names become paths
- dependencies become explicit columns
- references become foreign-key-style laws
- total functions become tables plus totality laws
2026-03-30 10:45:39 +02:00
So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped form that later evaluation code can
consume.
2026-03-30 10:41:21 +02:00
## Status
The IR looks good as a first compiler pass, but not yet as a complete runtime model.
What already looks solid:
- It matches the direction described in the other notes: theories become tables plus laws.
- It already lowers nontrivial dependent structure into a relational form.
- It makes important constraints explicit, especially `foreignKeys` and `total`.
- It gives Geolog a plausible backend-neutral compiled form between elaboration and evaluation.
What still looks missing:
- explicit runtime state for the chase
- fresh witness generation for existential conclusions
- branch or world tracking for disjunction
- equality-class or term-merging machinery
- provenance and scheduling information
So the current IR is best seen as a lowered theory IR, not yet a full chase-runtime IR.
2026-03-30 10:45:39 +02:00
If the project keeps this design, the natural next step is probably a second layer above `FlatTheory` for runtime execution state. That layer would
hold current facts, branch identity, fresh-name allocation, equality information, and evaluation metadata.
2026-03-30 10:41:21 +02:00
## Note
2026-03-30 10:45:39 +02:00
`docs/lecture3.tex` has a `Query IR` section header but no content yet, so the code above is currently the most concrete description of the actual
intermediate representation.
## Changelog
* **Mar 30, 2026** -- First version created.