Add a note on current version of Geolog's IR

This commit is contained in:
Hassan Abedi 2026-03-30 09:51:57 +02:00
parent a8fc42943d
commit bbe9653a78
2 changed files with 148 additions and 1 deletions

3
.gitignore vendored
View File

@ -41,4 +41,5 @@ target
## Misc
*.log
tmp/
temp/
temp/
.claude/

View File

@ -0,0 +1,146 @@
# Geolog Intermediate Representation
This note summarizes the intermediate representation currently implemented in the Geolog repo on branch `ir-draft1`:
https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads
All file paths below are repo-relative, not local workspace paths. The main sources are `geolog-lang/src/Geolog/IR.hs` and `geolog-lang/src/Geolog/Lower.hs`.
## Where the IR sits
The current pipeline is:
1. Source text is lexed and parsed.
2. The elaborator builds a typed core representation.
3. `lowerTop` lowers that elaborated theory into `I.FlatTheory`.
4. Tests pretty-print the lowered form as `table ...` and `law ...` declarations.
The lowering entry point is `geolog-lang/src/Geolog/Lower.hs`, via `lowerTop`, which calls `lower`, then `lowerEntry`, then `finishTheory`.
## Core IR data types
The IR is intentionally relational.
- `Path = [QName]`: a hierarchical name like `["M","mul"]`, pretty-printed as `M.mul`.
- `ColType` is the type of a column. `EntityType Path` means the column refers to rows from another entity/table, `PrimType PrimInt | PrimString` covers builtin scalars, and `Tuple [(QName, ColType)]` supports structured values.
- `Table` contains `columns :: [ColType]` and `primaryKey :: Maybe [Int]`.
- `Term` supports literals, variables, projections like `x.field`, and record construction.
- `Atom` contains `table :: Path`, `rowId :: Maybe Term`, and `values :: Map Int Term`.
- `Prop` and `Law` provide a small logic language for atoms, equality, conjunction, disjunction, and universally quantified Horn-style laws.
- `FlatTheory` is the final lowered package with `tables :: Map Path Table` and `laws :: Map Path Law`.
See `geolog-lang/src/Geolog/IR.hs` for the concrete definitions.
## What lowering does
Lowering walks an elaborated theory and converts type-theoretic structure into tables plus integrity laws.
### 1. Records become path prefixes
Nested theory fields extend the current path root. If a record field `mul` is inside `M`, the lowered path becomes `M.mul`.
This happens in `lowerTheoryFields` in `geolog-lang/src/Geolog/Lower.hs`.
### 2. Relations become `RelTable`
When lowering sees `C.VU C.QueryU`, it creates a relational table with the currently accumulated `PiColumn`s and no primary key:
- `kind = RelTable`
- `columns = piColumnsOf ?columns`
- `total = Nothing`
This is the representation for proposition-like relations.
See `lowerTheoryTy` in `geolog-lang/src/Geolog/Lower.hs`.
### 3. Functions/entities become `FunTable`
When lowering encounters an abstract decoded type that does not reduce to a more concrete theory shape, it emits a `FunTable`.
At the end of lowering, function tables get a primary key consisting of all columns except the final result column. That matches the examples:
- `M.mul = [M.car, M.car, M.car, primary key(0,1)]`
- `D.mul = [..., primary key(0,1,2,3)]`
This logic is in `finishTheory` in `geolog-lang/src/Geolog/Lower.hs`.
### 4. Dependent structure becomes extra columns
The interesting part is that lowering is not just flattening syntax. It also lambda-lifts shared context into explicit columns.
`liftOnShared` copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like `D.mul` contain both `M.car` arguments and displayed-car columns.
See `liftOnShared` in `geolog-lang/src/Geolog/Lower.hs`.
### 5. References generate join information
When a local path is used as:
- an attribute, lowering creates a `PiColumn`
- a value, lowering creates a `JoinColumn`
In both cases it also accumulates foreign-key-style atoms in `?fkeys`. Those atoms are later assembled into explicit laws.
See `lowerAttrNeutral` and `lowerArgNeutral` in `geolog-lang/src/Geolog/Lower.hs`.
## Why there are `foreignKeys` and `total` laws
Each lowered table may carry generated laws.
### `foreignKeys`
`makeFkeys` says: if a row exists in table `p`, then the rows it depends on must also exist.
Operationally, this is how dependent references are reified as relational constraints. A row in `D.mul` implies the existence of corresponding rows in `D.car` and `M.mul`.
See `makeFkeys` in `geolog-lang/src/Geolog/Lower.hs`.
### `total`
`makeTotal` is only generated for function tables. It says: if the prerequisite input rows exist, then a row in the function table must exist.
So `M.mul.total` expresses totality of multiplication, while `D.mul.total` expresses totality relative to the displayed structure.
See `makeTotal` in `geolog-lang/src/Geolog/Lower.hs`.
## Reading the printed IR
The pretty-printer renders `FlatTheory` as a sequence of `table` and `law` declarations. For example, the golden output for `magma` contains:
```text
table M.mul = [M.car, M.car, M.car, primary key(0,1)]
law M.mul.total = forall (a : M.car) (b : M.car). T |- M.mul(a, b)
```
and:
```text
table D.mul = [M.car, M.car, D.car, D.car, D.car, primary key(0,1,2,3)]
```
This means:
- the first columns are inputs or lifted context
- the last column is the produced value/row
- the primary key marks the determining inputs
- `foreignKeys` laws enforce referenced rows
- `total` laws enforce existence of output rows for function tables
Concrete examples live in:
- `geolog-lang/test/magma.output`
- `geolog-lang/test/paths.output`
## Practical takeaway
The implemented IR is best understood as a relationalized, flattened theory:
- names become paths
- dependencies become explicit columns
- references become foreign-key-style laws
- total functions become tables plus totality laws
So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped semantics that downstream query/evaluation machinery can consume.
## Note
`docs/lecture3.tex` has a `Query IR` section header but no content yet, so the code above is currently the most concrete description of the actual intermediate representation.