From bbe9653a780fc9d5fca9cb1402ea95f5c7f6f40c Mon Sep 17 00:00:00 2001 From: Hassan Abedi Date: Mon, 30 Mar 2026 09:51:57 +0200 Subject: [PATCH] Add a note on current version of Geolog's IR --- .gitignore | 3 +- .../011-geolog-intermediate-representation.md | 146 ++++++++++++++++++ 2 files changed, 148 insertions(+), 1 deletion(-) create mode 100644 hassan/011-geolog-intermediate-representation.md diff --git a/.gitignore b/.gitignore index 665774d..0471c6b 100644 --- a/.gitignore +++ b/.gitignore @@ -41,4 +41,5 @@ target ## Misc *.log tmp/ -temp/ \ No newline at end of file +temp/ +.claude/ diff --git a/hassan/011-geolog-intermediate-representation.md b/hassan/011-geolog-intermediate-representation.md new file mode 100644 index 0000000..3e25f8e --- /dev/null +++ b/hassan/011-geolog-intermediate-representation.md @@ -0,0 +1,146 @@ +# Geolog Intermediate Representation + +This note summarizes the intermediate representation currently implemented in the Geolog repo on branch `ir-draft1`: +https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads + +All file paths below are repo-relative, not local workspace paths. The main sources are `geolog-lang/src/Geolog/IR.hs` and `geolog-lang/src/Geolog/Lower.hs`. + +## Where the IR sits + +The current pipeline is: + +1. Source text is lexed and parsed. +2. The elaborator builds a typed core representation. +3. `lowerTop` lowers that elaborated theory into `I.FlatTheory`. +4. Tests pretty-print the lowered form as `table ...` and `law ...` declarations. + +The lowering entry point is `geolog-lang/src/Geolog/Lower.hs`, via `lowerTop`, which calls `lower`, then `lowerEntry`, then `finishTheory`. + +## Core IR data types + +The IR is intentionally relational. + +- `Path = [QName]`: a hierarchical name like `["M","mul"]`, pretty-printed as `M.mul`. +- `ColType` is the type of a column. `EntityType Path` means the column refers to rows from another entity/table, `PrimType PrimInt | PrimString` covers builtin scalars, and `Tuple [(QName, ColType)]` supports structured values. +- `Table` contains `columns :: [ColType]` and `primaryKey :: Maybe [Int]`. +- `Term` supports literals, variables, projections like `x.field`, and record construction. +- `Atom` contains `table :: Path`, `rowId :: Maybe Term`, and `values :: Map Int Term`. +- `Prop` and `Law` provide a small logic language for atoms, equality, conjunction, disjunction, and universally quantified Horn-style laws. +- `FlatTheory` is the final lowered package with `tables :: Map Path Table` and `laws :: Map Path Law`. + +See `geolog-lang/src/Geolog/IR.hs` for the concrete definitions. + +## What lowering does + +Lowering walks an elaborated theory and converts type-theoretic structure into tables plus integrity laws. + +### 1. Records become path prefixes + +Nested theory fields extend the current path root. If a record field `mul` is inside `M`, the lowered path becomes `M.mul`. + +This happens in `lowerTheoryFields` in `geolog-lang/src/Geolog/Lower.hs`. + +### 2. Relations become `RelTable` + +When lowering sees `C.VU C.QueryU`, it creates a relational table with the currently accumulated `PiColumn`s and no primary key: + +- `kind = RelTable` +- `columns = piColumnsOf ?columns` +- `total = Nothing` + +This is the representation for proposition-like relations. + +See `lowerTheoryTy` in `geolog-lang/src/Geolog/Lower.hs`. + +### 3. Functions/entities become `FunTable` + +When lowering encounters an abstract decoded type that does not reduce to a more concrete theory shape, it emits a `FunTable`. + +At the end of lowering, function tables get a primary key consisting of all columns except the final result column. That matches the examples: + +- `M.mul = [M.car, M.car, M.car, primary key(0,1)]` +- `D.mul = [..., primary key(0,1,2,3)]` + +This logic is in `finishTheory` in `geolog-lang/src/Geolog/Lower.hs`. + +### 4. Dependent structure becomes extra columns + +The interesting part is that lowering is not just flattening syntax. It also lambda-lifts shared context into explicit columns. + +`liftOnShared` copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like `D.mul` contain both `M.car` arguments and displayed-car columns. + +See `liftOnShared` in `geolog-lang/src/Geolog/Lower.hs`. + +### 5. References generate join information + +When a local path is used as: + +- an attribute, lowering creates a `PiColumn` +- a value, lowering creates a `JoinColumn` + +In both cases it also accumulates foreign-key-style atoms in `?fkeys`. Those atoms are later assembled into explicit laws. + +See `lowerAttrNeutral` and `lowerArgNeutral` in `geolog-lang/src/Geolog/Lower.hs`. + +## Why there are `foreignKeys` and `total` laws + +Each lowered table may carry generated laws. + +### `foreignKeys` + +`makeFkeys` says: if a row exists in table `p`, then the rows it depends on must also exist. + +Operationally, this is how dependent references are reified as relational constraints. A row in `D.mul` implies the existence of corresponding rows in `D.car` and `M.mul`. + +See `makeFkeys` in `geolog-lang/src/Geolog/Lower.hs`. + +### `total` + +`makeTotal` is only generated for function tables. It says: if the prerequisite input rows exist, then a row in the function table must exist. + +So `M.mul.total` expresses totality of multiplication, while `D.mul.total` expresses totality relative to the displayed structure. + +See `makeTotal` in `geolog-lang/src/Geolog/Lower.hs`. + +## Reading the printed IR + +The pretty-printer renders `FlatTheory` as a sequence of `table` and `law` declarations. For example, the golden output for `magma` contains: + +```text +table M.mul = [M.car, M.car, M.car, primary key(0,1)] +law M.mul.total = forall (a : M.car) (b : M.car). T |- M.mul(a, b) +``` + +and: + +```text +table D.mul = [M.car, M.car, D.car, D.car, D.car, primary key(0,1,2,3)] +``` + +This means: + +- the first columns are inputs or lifted context +- the last column is the produced value/row +- the primary key marks the determining inputs +- `foreignKeys` laws enforce referenced rows +- `total` laws enforce existence of output rows for function tables + +Concrete examples live in: + +- `geolog-lang/test/magma.output` +- `geolog-lang/test/paths.output` + +## Practical takeaway + +The implemented IR is best understood as a relationalized, flattened theory: + +- names become paths +- dependencies become explicit columns +- references become foreign-key-style laws +- total functions become tables plus totality laws + +So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped semantics that downstream query/evaluation machinery can consume. + +## Note + +`docs/lecture3.tex` has a `Query IR` section header but no content yet, so the code above is currently the most concrete description of the actual intermediate representation.