useful-notes/hassan/011-geolog-intermediate-representation.md
2026-03-30 10:10:06 +02:00

5.8 KiB

Geolog Intermediate Representation

This note summarizes the intermediate representation currently implemented in the Geolog repo on branch ir-draft1: https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads

All file paths below are repo-relative, not local workspace paths. The main sources are geolog-lang/src/Geolog/IR.hs and geolog-lang/src/Geolog/Lower.hs.

Where the IR sits

The current pipeline is:

  1. Source text is lexed and parsed.
  2. The elaborator builds a typed core representation.
  3. lowerTop lowers that elaborated theory into I.FlatTheory.
  4. Tests pretty-print the lowered form as table ... and law ... declarations.

The lowering entry point is geolog-lang/src/Geolog/Lower.hs, via lowerTop, which calls lower, then lowerEntry, then finishTheory.

Core IR data types

The IR is intentionally relational.

  • Path = [QName]: a hierarchical name like ["M","mul"], pretty-printed as M.mul.
  • ColType is the type of a column. EntityType Path means the column refers to rows from another entity/table, PrimType PrimInt | PrimString covers builtin scalars, and Tuple [(QName, ColType)] supports structured values.
  • Table contains columns :: [ColType] and primaryKey :: Maybe [Int].
  • Term supports literals, variables, projections like x.field, and record construction.
  • Atom contains table :: Path, rowId :: Maybe Term, and values :: Map Int Term.
  • Prop and Law provide a small logic language for atoms, equality, conjunction, disjunction, and universally quantified Horn-style laws.
  • FlatTheory is the final lowered package with tables :: Map Path Table and laws :: Map Path Law.

See geolog-lang/src/Geolog/IR.hs for the concrete definitions.

What lowering does

Lowering walks an elaborated theory and converts type-theoretic structure into tables plus integrity laws.

1. Records become path prefixes

Nested theory fields extend the current path root. If a record field mul is inside M, the lowered path becomes M.mul.

This happens in lowerTheoryFields in geolog-lang/src/Geolog/Lower.hs.

2. Relations become RelTable

When lowering sees C.VU C.QueryU, it creates a relational table with the currently accumulated PiColumns and no primary key:

  • kind = RelTable
  • columns = piColumnsOf ?columns
  • total = Nothing

This is the representation for proposition-like relations.

See lowerTheoryTy in geolog-lang/src/Geolog/Lower.hs.

3. Functions/entities become FunTable

When lowering encounters an abstract decoded type that does not reduce to a more concrete theory shape, it emits a FunTable.

At the end of lowering, function tables get a primary key consisting of all columns except the final result column. That matches the examples:

  • M.mul = [M.car, M.car, M.car, primary key(0,1)]
  • D.mul = [..., primary key(0,1,2,3)]

This logic is in finishTheory in geolog-lang/src/Geolog/Lower.hs.

4. Dependent structure becomes extra columns

The interesting part is that lowering is not just flattening syntax. It also lambda-lifts shared context into explicit columns.

liftOnShared copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like D.mul contain both M.car arguments and displayed-car columns.

See liftOnShared in geolog-lang/src/Geolog/Lower.hs.

5. References generate join information

When a local path is used as:

  • an attribute, lowering creates a PiColumn
  • a value, lowering creates a JoinColumn

In both cases it also accumulates foreign-key-style atoms in ?fkeys. Those atoms are later assembled into explicit laws.

See lowerAttrNeutral and lowerArgNeutral in geolog-lang/src/Geolog/Lower.hs.

Why there are foreignKeys and total laws

Each lowered table may carry generated laws.

foreignKeys

makeFkeys says: if a row exists in table p, then the rows it depends on must also exist.

Operationally, this is how dependent references are reified as relational constraints. A row in D.mul implies the existence of corresponding rows in D.car and M.mul.

See makeFkeys in geolog-lang/src/Geolog/Lower.hs.

total

makeTotal is only generated for function tables. It says: if the prerequisite input rows exist, then a row in the function table must exist.

So M.mul.total expresses totality of multiplication, while D.mul.total expresses totality relative to the displayed structure.

See makeTotal in geolog-lang/src/Geolog/Lower.hs.

Reading the printed IR

The pretty-printer renders FlatTheory as a sequence of table and law declarations. For example, the golden output for magma contains:

table M.mul = [M.car, M.car, M.car, primary key(0,1)]
law M.mul.total = forall (a : M.car) (b : M.car). T |- M.mul(a, b)

and:

table D.mul = [M.car, M.car, D.car, D.car, D.car, primary key(0,1,2,3)]

This means:

  • the first columns are inputs or lifted context
  • the last column is the produced value/row
  • the primary key marks the determining inputs
  • foreignKeys laws enforce referenced rows
  • total laws enforce existence of output rows for function tables

Concrete examples live in:

  • geolog-lang/test/magma.output
  • geolog-lang/test/paths.output

Practical takeaway

The implemented IR is best understood as a relationalized, flattened theory:

  • names become paths
  • dependencies become explicit columns
  • references become foreign-key-style laws
  • total functions become tables plus totality laws

So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped semantics that downstream query/evaluation machinery can consume.

Note

docs/lecture3.tex has a Query IR section header but no content yet, so the code above is currently the most concrete description of the actual intermediate representation.