An overview of the building module's role and abilities.
- §1. Prerequisites
- §2. Introduction
- §3. Large-scale architecture
- §5. Medium-scale blueprints
- §7. Small-scale masonry
§1. Prerequisites. The building module is a part of the Inform compiler toolset. It is presented as a literate program or "web". Before diving in:
- (a) It helps to have some experience of reading webs: see inweb for more.
- (b) The module is written in C, in fact ANSI C99, but this is disguised by the fact that it uses some extension syntaxes provided by the inweb literate programming tool, making it a dialect of C called InC. See inweb for full details, but essentially: it's C without predeclarations or header files, and where functions have names like Tags::add_by_name rather than just add_by_name.
- (c) This module uses other modules drawn from the compiler (see structure), and also uses a module of utility functions called foundation. For more, see A Brief Guide to Foundation (in foundation).
§2. Introduction. This module is essentially middleware. It acts as a bridge to the low-level functions in the bytecode module, allowing them to be used with much greater ease and consistency.
This module needs plenty of working data, and stashes that data inside the inter_tree structure it is working on: in a component of that structure called a building_site. Whereas the main data in an inter_tree affects the meaning of the tree, i.e., makes a difference as to what program the tree represents, the contents of the building_site component are only used to make it, and are ignored by the final code-generator.
§3. Large-scale architecture. An inter tree is fundamentally a set of resources stored in a nested set of inter_package boxes.
- ● The following resources are stored at the root level (i.e., not inside of
any package) and nowhere else:
- ● Package type declarations. See LargeScale::package_type.
- ● Primitive declarations. See Inter Primitives. Again, Inter can in principle support a variety of different "instruction sets", but this module presents a single standardised instruction set.
- ● Compiler pragmas. These are marginal tweaks on a platform-by-platform basis and use of them is minimal, but see LargeScale::emit_pragma.
- ● Everything else is inside a single top-level package called main, which has package type _plain.
- ● main contains only packages, and of only two types:
- ● "Modules", which are packages of type _module. These occur nowhere else in the tree.
- ● "Linkages", which are packages of type _linkage. These occur nowhere else in the tree.
- ● inform7 compiles the material in each compilation unit to a module
named for that unit. That is:
- ● The module source_text contains material from the main source text.
- ● Each extension included produces a module, named, for example, locksmith_by_emily_short.
- ● Each kit produces a module, named after it. Any Inter tree produced by inform7 will always contain the module BasicInformKit, for example.
- ● inform7 generates an additional module called generic, holding generic definitions — material which is the same regardless of what is being compiled.
- ● inform7 generates an additional module called completion, holding resources put together from across different compilation units.1
- ● inter generates an additional module called synoptic, made during linking, which contains resources collated from or cross-referencing everything else.
- ● Modules contain only further packages, called "submodules" and with the package type _submodule. The Inform tools use a standard set of names for such submodules: for example, in any module the resources defining its global variables are in a submodule called variables. (If it defines no variables, the submodule will not be present.)
- ● There are just two different linkages — packages with special contents
and which the linking steps of pipeline treat differently from modules.
- ● architecture has no subpackages, and contains only constant definitions, drawn from a fixed and limited set. These definitions depend on, and indeed express, the target architecture: for example, WORDSIZE, the number of bytes per word, is defined here. Symbols here behave uniquely in linking: when two trees are linked together, they will each have an architecture package, and symbols in them will simply be identified with each other. Thus the WORDSIZE defined in the main Inform 7 tree will be considered the same symbol as the WORDSIZE defined in the tree for BasicInformKit.
- ● connectors has no subpackages and no resources other than symbols. It holds plugs and sockets enabling the Inter tree to be linked with other Inter trees; during linking, these are removed when their purposes has been served, so that after a successful link, connectors will always be empty.
See Large-Scale Structure for the code which builds all of the above packages (though not their contents).
1 Ideally completion would not exist, and everything in it would be made as part of synoptic during linking, but at present this is too difficult. ↩
§4. Inter code is a nested tree of boxes, inter_packages, which contain Inter code defining various resources, cross-referenced by inter_symbols.
But this tree cannot be magically made all at once. For much of the run of a tool like inform7, a partly-built tree will exist, and this introduces many potential race conditions — where, for example, a call to function F cannot be made until F itself has been made, and so on.
We also want to avoid bugs where one part of the compiler thinks that F will live in one place, and another part thinks it is somewhere else.
To that end, we use a flexible way to describe naming and positioning conventions for Inter resources (such as our hypothetical F). In this system, a package_request stands for a package which may or may not already exist; and an inter_name, similarly, is a symbol which may or may not exist yet. This enables tools like inform7 to build up elaborate if shadowy worlds of references to tree positions which will be filled in later.
DEFINITELY MADE PERHAPS NOT YET MADE PACKAGE inter_package //package_request// SYMBOL inter_symbol //inter_name//
So, for example, a package_request can represent /main/synoptic/kinds either before or after that package has been built. At some point the package ceases to be virtual and comes into being: this is called "incarnation". But code in inform7 using package requests never needs to know when this takes place, and will function equally well before or after — so, no race conditions.
And similarly for inter_name, which it would perhaps be more consistent to call a symbol_request. But "iname" is now a term used almost ubiquitously across inform7 and inter, and it doesn't seem worth renaming it now.
§5. Medium-scale blueprints. The above systems make nested packages and symbols within them, but not the actual content of these boxes, or the definitions which the symbols refer to. In short, the actual Inter code.
The straightforward way to compile some Inter code is to make calls to functions in Producing Inter, which provide a straightforward if low-level API. For example:
inter_name *iname = HierarchyLocations::iname(I, CCOUNT_PROPERTY_HL); Produce::numeric_constant(I, iname, K_value, x);
Note that we do not need to say where this code will go. Producing Inter looks at the iname, works out what package request it should go into, incarnates that into a real inter_package if necessary, then incarnates the iname into a real inter_symbol if necessary; and finally emits a CONSTANT_IST in the relevant package, an instruction which defines the symbol.
And similarly for emitting code inside a function body, though then it is necessary first to say what function (which can be done by calling Produce::function_body with the iname for that function). For example:
Produce::inv_primitive(I, RETURN_BIP); Produce::down(I); Produce::val(I, K_value, InterValuePairs::number(1)); Produce::up(I);
§6. But that is a laborious sort of notation for what, in a C-like language, would be written just as return 1. It would be very painful to have to implement kits such as BasicInformKit that way. Instead, we write them in a notation which is very close indeed2 to Inform 6 syntax.3
This means we need to provide what amounts to a pocket Inform-6-to-Inter compiler, and we do that in this module, using a data structure called an inter_schema — in effect, an annotated syntax tree — to represent the results of parsing Inform 6 notation. For example, this:
inter_schema *sch = ParsingSchemas::from_text(I"return true;", where); EmitInterSchemas::emit(I, ..., sch, ...);
generates Inter code equivalent to the example above.4 But the real power of the system comes from:
- (a) The ability to handle much larger passages of I6 notation - for example, a function body 10K long — in an acceptably speed-efficient way; and
- (b) The ability to subsctitute values in for placeholders.
As an example of (b), an inter_schema is how inform7 compiles so-called inline phrase definitions such as:
To say (L - a list of values) in brace notation: (- LIST_OF_TY_Say({-by-reference:L}, 1); -).
Here, the text LIST_OF_TY_Say({-by-reference:L}, 1); is passed through to ParsingSchemas::from_text to make a schema. When the phrase is invoked, EmitInterSchemas::emit is used to generate Inter code from it; and a reference to the list passed to the invocation as the token L is substituted for the braced clause {-by-reference:L}.5 Schemas are also used as convenient shorthand in the compiler to express how to, for example, post-increment a property value.
2 Some antique syntaxes, such as for loops broken with semicolons not colons, are missing; so are some hardly-used directives; and the superclass :: operator; and built-in compiler symbols relevant only to particular virtual machines, such as #g$self, are not there. But really, you will never notice they are gone. ↩
3 Using Inform 6 notation was very convenient in the years 2004-17, when Inform generated only I6 code: it became more problematic in 2018, when Inter instructions were needed instead, and much of this module was written as a response. ↩
4 Skipping over some of the arguments to the emission function, which basically tell us how to resolve identifier names into variables, arrays, and so on. ↩
5 These braced placeholders are, of course, not Inform 6 notation, and represent an extension of the I6 syntax. ↩
§7. Small-scale masonry. Finally, there are also times when we want to compile explicit code, one Inter instruction at a time, and for this the Produce API is provided.
This API keeps track of the current write position inside each tree (using the code_insertion_point system), and then provides functions which call down into bytecode for us, making use of that write position. So, for example, we can write:
Produce::inv_primitive(I, RETURN_BIP); Produce::down(I); Produce::val(I, K_value, InterValuePairs::number(17)); Produce::up(I);
to produce the Inter code:
inv !return val K_unchecked 17
Note the use of Produce::down and Produce::up to step up and down the hierarchy: these functions are always called in matching ways.
§8. The pipeline module makes heavy use of the Produce API. Surprising, inform7 calls it in only a few places — but in fact that is because it provides still another middleware layer on top. See Emit (in runtime). But it's really only a very thin layer, allowing the caller not to have to pass the I argument to every call (because it will always be the Inter tree being compiled by inform7). Despite appearances, then, Produce makes all of the Inter instructions generated inside either inter or inform7.