useful-notes/hassan/011-geolog-intermediate-representation.md
2026-03-30 10:41:21 +02:00

7.9 KiB

Geolog Intermediate Representation

This note summarizes the intermediate representation currently implemented in the Geolog repo on branch ir-draft1: https://git.sgai.uk/creators/geolog/-/tree/ir-draft1?ref_type=heads

All file paths below are repo-relative, not local workspace paths. The main sources are geolog-lang/src/Geolog/IR.hs and geolog-lang/src/Geolog/Lower.hs.

Where the IR sits

The current pipeline is:

  1. Source text is lexed and parsed.
  2. The elaborator builds a typed core representation.
  3. lowerTop lowers that elaborated theory into I.FlatTheory.
  4. Tests pretty-print the lowered form as table ... and law ... declarations.

The lowering entry point is geolog-lang/src/Geolog/Lower.hs, via lowerTop, which calls lower, then lowerEntry, then finishTheory.

Glossary

  • IR: Intermediate representation. A simpler form used between elaboration and later evaluation.
  • lowering: The step that converts elaborated Geolog code into the IR.
  • Path: A qualified name like M.mul.
  • ColType: The type of one table column.
  • Table: A lowered relational object with columns and maybe a primary key.
  • RelTable: A table used for a relation or predicate.
  • FunTable: A table used for something function-like, where input columns determine an output.
  • Term: A small IR expression, such as a variable, literal, or projection.
  • Atom: One table occurrence with concrete arguments.
  • Prop: A logical formula built from atoms and connectives like equality and conjunction.
  • Law: A quantified rule over IR propositions.
  • FlatTheory: The final lowered result: a set of tables plus laws.
  • foreignKeys: Generated laws saying a row depends on other rows existing.
  • total: Generated laws saying a function row must exist when its inputs exist.
  • PiColumn: A column introduced as part of a table's explicit shape.
  • JoinColumn: A column introduced while following a reference or join.

Core IR data types

The IR is intentionally relational. It turns theory structure into tables and rules.

  • Path = [QName]: a hierarchical name like ["M","mul"], pretty-printed as M.mul.
  • ColType is the type of a column. EntityType Path means the column refers to rows from another entity/table, PrimType PrimInt | PrimString covers builtin scalars, and Tuple [(QName, ColType)] supports structured values.
  • Table contains columns :: [ColType] and primaryKey :: Maybe [Int].
  • Term supports literals, variables, projections like x.field, and record construction.
  • Atom contains table :: Path, rowId :: Maybe Term, and values :: Map Int Term.
  • Prop and Law provide a small logic language for atoms, equality, conjunction, disjunction, and universally quantified Horn-style laws.
  • FlatTheory is the final lowered package with tables :: Map Path Table and laws :: Map Path Law.

See geolog-lang/src/Geolog/IR.hs for the concrete definitions.

What lowering does

Lowering walks the elaborated theory and turns it into tables plus rules.

1. Records become path prefixes

Nested theory fields extend the current path root. If a record field mul is inside M, the lowered path becomes M.mul.

This happens in lowerTheoryFields in geolog-lang/src/Geolog/Lower.hs.

2. Relations become RelTable

When lowering sees C.VU C.QueryU, it creates a relational table with the currently accumulated PiColumns and no primary key:

  • kind = RelTable
  • columns = piColumnsOf ?columns
  • total = Nothing

This is the representation for proposition-like relations.

See lowerTheoryTy in geolog-lang/src/Geolog/Lower.hs.

3. Functions/entities become FunTable

When lowering finds something function-like that does not reduce to a simpler theory shape, it emits a FunTable.

At the end of lowering, function tables get a primary key consisting of all columns except the final result column. That matches the examples:

  • M.mul = [M.car, M.car, M.car, primary key(0,1)]
  • D.mul = [..., primary key(0,1,2,3)]

This logic is in finishTheory in geolog-lang/src/Geolog/Lower.hs.

4. Dependent structure becomes extra columns

Lowering is not just flattening syntax. It also turns shared context into explicit columns.

liftOnShared copies shared outer variables into the current row shape so that dependencies become relationally explicit. This is why tables like D.mul contain both M.car arguments and displayed-car columns.

See liftOnShared in geolog-lang/src/Geolog/Lower.hs.

5. References generate join information

When a local path is used as:

  • an attribute, lowering creates a PiColumn
  • a value, lowering creates a JoinColumn

In both cases it also accumulates foreign-key-style atoms in ?fkeys. Those atoms are later assembled into explicit laws.

See lowerAttrNeutral and lowerArgNeutral in geolog-lang/src/Geolog/Lower.hs.

Why there are foreignKeys and total laws

Each lowered table may carry generated laws.

foreignKeys

makeFkeys says: if a row exists in table p, then the rows it depends on must also exist.

This is how dependent references become relational constraints. A row in D.mul implies the existence of corresponding rows in D.car and M.mul.

See makeFkeys in geolog-lang/src/Geolog/Lower.hs.

total

makeTotal is only generated for function tables. It says: if the prerequisite input rows exist, then a row in the function table must exist.

So M.mul.total expresses totality of multiplication, while D.mul.total expresses totality relative to the displayed structure.

See makeTotal in geolog-lang/src/Geolog/Lower.hs.

Reading the printed IR

The pretty-printer renders FlatTheory as a sequence of table and law declarations. For example, the golden output for magma contains:

table M.mul = [M.car, M.car, M.car, primary key(0,1)]
law M.mul.total = forall (a : M.car) (b : M.car). T |- M.mul(a, b)

and:

table D.mul = [M.car, M.car, D.car, D.car, D.car, primary key(0,1,2,3)]

This means:

  • the first columns are inputs or lifted context
  • the last column is the produced value/row
  • the primary key marks the determining inputs
  • foreignKeys laws enforce referenced rows
  • total laws enforce existence of output rows for function tables

Concrete examples live in:

  • geolog-lang/test/magma.output
  • geolog-lang/test/paths.output

Practical takeaway

The implemented IR is best understood as a relationalized, flattened theory:

  • names become paths
  • dependencies become explicit columns
  • references become foreign-key-style laws
  • total functions become tables plus totality laws

So the IR is not just an AST for queries. It is the bridge from elaborated dependent syntax to a database-shaped form that later evaluation code can consume.

Status

The IR looks good as a first compiler pass, but not yet as a complete runtime model.

What already looks solid:

  • It matches the direction described in the other notes: theories become tables plus laws.
  • It already lowers nontrivial dependent structure into a relational form.
  • It makes important constraints explicit, especially foreignKeys and total.
  • It gives Geolog a plausible backend-neutral compiled form between elaboration and evaluation.

What still looks missing:

  • explicit runtime state for the chase
  • fresh witness generation for existential conclusions
  • branch or world tracking for disjunction
  • equality-class or term-merging machinery
  • provenance and scheduling information

So the current IR is best seen as a lowered theory IR, not yet a full chase-runtime IR.

If the project keeps this design, the natural next step is probably a second layer above FlatTheory for runtime execution state. That layer would hold current facts, branch identity, fresh-name allocation, equality information, and evaluation metadata.

Note

docs/lecture3.tex has a Query IR section header but no content yet, so the code above is currently the most concrete description of the actual intermediate representation.