An overview of how Inweb works, with links to all of its important functions.


§1. Prerequisites. This page is to help readers to get their bearings in the source code for Inweb, which is a literate program or "web". Before diving in:

§2. Working out what to do, and what to do it to. Inweb is a C program, so it begins at main, in Program Control. PC works out where Inweb is installed, then calls Configuration, which reads the command line options.

The user's choices are stored in an inweb_instructions object, and Inweb is put into one of four modes: TANGLE_MODE, WEAVE_MODE, ANALYSE_MODE, or TRANSLATE_MODE.1 Inweb never changes mode: once set, it remains for the rest of the run. Inweb also acts on only one main web in any run, unless in TRANSLATE_MODE, in which case none.

Once it has worked through the command line, Configuration also calls Colonies::load to read the colony file, if one was given (see Making Weaves into Websites), and uses this to preset some settings: see Configuration::member_and_colony.

All errors in configuration are sent to Errors::fatal, from whose bourne no traveller returns.

§3. Program Control then resumes, calling Main::follow_instructions to act on the inweb_instructions object. If the user did specify a web to work on, PC then goes through three stages to understand it.

First, PC calls Reader::load_web to read the metadata of the web — that is, its title and author, how it breaks down into chapters and sections, and what modules it imports. The real work is done by the Foundation library function WebMetadata::get, which returns a web_md object, providing details such as its declared author and title (see Bibliographic Data for Webs (in foundation)), and also references to a chapter_md for each chapter, and a section_md for each section. There is always at least one chapter_md, each of which has at least one section_md.2 The "range text" for each chapter and section is set here, which affects leafnames used in woven websites.3 The optional build.txt file for a web is read by BuildFiles::read, and the semantic version number determined at BuildFiles::deduce_semver.

Where a web imports a module, as for instance the eastertide example does, WebMetadata::get creates a module object for each import. In any event, it also creates a module called "(main)" to represent the main, non-imported, part of the overall program. Each module object also refers to the chapter_md and section_md objects.4

The result of Reader::load_web is an object called a web, which expands on the metadata considerably. If W is a web, W->md produces its web_md metadata, but W also has numerous other fields.

§4. After loading, the second stage is to call Reader::read_web. Whereas loading was rapid and involved looking only at the contents page, reading takes longer and means extracting every line of commentary or code. Just as the loader wrapped the web_md in a larger web object, so too the reader wraps each chapter_md in a chapter, and each section_md in a section.

Inweb syntax is heavily line-based, and every line of every section file (except the Contents page) becomes a source_line. In the end, then, Inweb has built a four-level hierarchy on top of the more basic three-level hierarchy produced by foundation:

INWEB        web     ---->  chapter     ---->  section     ---->   source_line
              |                |                  |
FOUNDATION   web_md  ---->  chapter_md  ---->  section_md
             module

§5. The third stage is to call Parser::parse_web. This is where we check that the web is syntactically valid line-by-line, reporting errors if any using by calling Main::error_in_web. Each line is assigned a "category": for example, the category DEFINITIONS_LCAT is given to lines holding definitions made with @d or @e. See Line Categories for the complete roster.5 Running Inweb with the -scan switch lists out the lines parsed in this way; for example:

Scan of source lines for '0'
0000001  SECTION_HEADING.....  Main.
0000002  COMMENT_BODY........  
0000003  PURPOSE.............  Implied Purpose: This example of using inweb is a whole web in a single short file, to look for twin primes, a classic problem in number theory.
0000004  COMMENT_BODY........  
0000005  HEADING_START.......  @h The conjecture.
0000006  COMMENT_BODY........  It is widely believed that there are an infinite number of twin primes, that
0000007  COMMENT_BODY........  is, prime numbers occurring in pairs different by 2. Twins are known to exist
0000008  COMMENT_BODY........  at least as far out as $10^{388,342}$ (as of 2016), and there are infinitely
0000009  COMMENT_BODY........  many pairs of primes closer together than about 250 (Zhang, 2013; Tao, Maynard,
0000010  COMMENT_BODY........  and many others, 2014).
0000011  COMMENT_BODY........  
0000012  COMMENT_BODY........  This program finds a few small pairs of twins, by the simplest method possible,
0000013  COMMENT_BODY........  and should print output like so:
0000014  BEGIN_CODE..........  = (text)
0000015  TEXT_EXTRACT........   3 and 5
0000016  TEXT_EXTRACT........   5 and 7
0000017  TEXT_EXTRACT........   11 and 13
0000018  TEXT_EXTRACT........   ...
0000019  END_EXTRACT.........  =
0000020  COMMENT_BODY........  
0000021  BEGIN_DEFINITION....  @d RANGE 100 /* the upper limit to the numbers we will consider */
0000022  COMMENT_BODY........  
0000023  BEGIN_CODE..........  =
0000024  C_LIBRARY_INCLUDE...  #include <stdio.h>
0000025  CODE_BODY...........  
0000026  CODE_BODY...........  int main(int argc, char *argv[]) {
0000027  CODE_BODY...........   for (int i=1; i<RANGE; i++)
0000028  CODE_BODY...........       @<Test for twin prime at i@>;
0000029  CODE_BODY...........  }
0000030  CODE_BODY...........  
0000031  PARAGRAPH_START.....  @
0000032  MACRO_DEFINITION....  @<Test for twin prime at i@> =
0000033  CODE_BODY...........   if ((isprime(i)) && (isprime(i+2)))
0000034  CODE_BODY...........       printf("%d and %d\n", i, i+2);
0000035  CODE_BODY...........  
0000036  HEADING_START.......  @h Primality.
0000037  COMMENT_BODY........  This simple and slow test tries to divide by every whole number at least
0000038  COMMENT_BODY........  2 and up to the square root: if none divide exactly, the number is prime.
0000039  COMMENT_BODY........  A common error with this algorithm is to check where $m^2 < n$, rather
0000040  COMMENT_BODY........  than $m^2 \leq n$, thus wrongly considering 4, 9, 25, 49, ... as prime:
0000041  COMMENT_BODY........  Cambridge folklore has it that this bug occurred on the first computation
0000042  COMMENT_BODY........  of the EDSAC computer on 6 May 1949.
0000043  COMMENT_BODY........  
0000044  BEGIN_DEFINITION....  @d TRUE 1
0000045  BEGIN_DEFINITION....  @d FALSE 0
0000046  COMMENT_BODY........  
0000047  BEGIN_CODE..........  =
0000048  CODE_BODY...........  int isprime(int n) {
0000049  CODE_BODY...........   if (n <= 1) return FALSE;
0000050  CODE_BODY...........   for (int m = 2; m*m <= n; m++)
0000051  CODE_BODY...........       if (n % m == 0)
0000052  CODE_BODY...........           return FALSE;
0000053  CODE_BODY...........   return TRUE;
0000054  CODE_BODY...........  }
0000055  CODE_BODY...........  

§6. The parser also recognises headings and footnotes, but most importantly, it introduces an additional concept: the paragraph. Each nunbered passage corresponds to one paragraph object; it may actually contain several paragraphs of prose in the everyday English sense, but has just one heading, usually a number like "2.3.1". Those numbers are assigned hierarchically,6 which is not a trivial algorithm: see Numbering::number_web.

It is the parser which finds all of the "paragraph macros", the term used in the source code for named stretches of code in @<...@> notation. A para_macro object is created for each one, and every section has its own collection, stored in a linked_list.7 Similarly, the parser finds all of the footnote texts, and works out their proper numbering; each becomes a footnote object.8

At the end of the third stage, then, everything's ready to go, and in memory we now have something like this:

INWEB        web     ---->  chapter     ---->  section     ---->  paragraph  ----> source_line
              |                |                  |               para_macro
FOUNDATION   web_md  ---->  chapter_md  ---->  section_md
             module

§7. Programming languages. The contents page of a web usually mentions one or more programming languages. A line at the top like

    Language: C

results in the text "C" being stored in the bibliographic datum "Language", and if contents lines for chapters or sections specify other languages,9 the loader stores those in the relevant chapter_md or section_md objects. But to the loader, these are all just names.

The reader then loads in definitions of these programming languages by calling Analyser::find_by_name, and the parser does the same when it finds extract lines like

    = (text as ACME)

to say that a passage of text must be syntax-coloured like the ACME language.

Analyser::find_by_name is thus called at any time when Inweb finds need of a language; it looks for a language definition file (see documentation at Supporting Programming Languages), parses it one line at a time using Languages::read_definition_line, and returns a programming_language object. These correspond to their names: you cannot have two different PL objects with languages both called "Python", say.

The practical effect is that a web can involve many languages, even though the main use case is to have just one throughout. web, chapter, section and even individual source_line objects all contain pointers to a programming_language.

§8. Weaving mode. Let's get back to Program Control, which has now set everything up and is about to take action. What it does depends on which of the four modes Inweb is in; we'll start with WEAVE_MODE, the most difficult.

Weaves are highly comfigurable, so they depend on several factors:

§9. Program Control begins by attempting to load the weave pattern, with Patterns::find; the syntax of weave pattern files can be found in Patterns::scan_pattern_line.

It then either calls Swarm::weave_subset — meaning, a subset of the web, going into a single output file — or Swarm::weave, which it turn splits the web into subsets and sends each of those to Swarm::weave_subset.

Swarm::weave also causes an "index" to be made, though "index" here is Inweb jargon for something which is more likely a contents page listing the sections and linking to them.11

Either way, each single weaving operation arrives at Swarm::weave_subset, which consolidates all the settings needed into a weave_order object: it says, in effect, "weave content X into file Y using pattern Z".12

§10. And so we descend into The Weaver, where the function Weaver::weave is given the weave_order and told to get on with it.13

Rather than directly converting the source to (say) an HTML representation, the Weaver first produces a "weave tree" which amounts to a format-neutral list of rendering instructions: it then hands the tree over to Formats::render. In this way, all specifics of individual output formats are kept at arm's length from the actual weaving algorithm.

The weave tree is a simple business, built in a single pass of a depth-first traverse of the web. The weaver keeps track of a modicum of "state" as it works, and these running details are stored in a weaver_state object, but this is thrown away as soon as the weaver finishes.

The trickiest point of building the weave tree is done by The Weaver of Text, which breaks up lines of commentary or code to identify uses of mathematical notation, footnote cues, function calls, and so on.

A convenience for testing the weave algorithm is to -weave-as TestingInweb. TestingInweb is a weave pattern that outputs a textual representation of the weave tree. For example:

document weave order 0
  head banner <Weave of 'The Twin Primes Conjecture' generated by Inweb>
  body
    chapter <Sections>
      chapter header <Sections>
      section <Main>
        section header <Main>
        section purpose <This example of using inweb is a whole web in a single short file, to look for twin primes, a classic problem in number theory.>
        toc - <S/all>
          toc line - <S1, The conjecture> P1'The conjecture'
          toc line - <S2, Primality> P2'Primality'
        paragraph P1'The conjecture'
          material discussion
            commentary <It is widely believed that there are an infinite number of twin primes, that\n>
            commentary <is, prime numbers occurring in pairs different by 2. Twins are known to exist\n>
            commentary <at least as far out as >
            mathematics <10^{388,342}>
            commentary < (as of 2016), and there are infinitely\n>
            commentary <many pairs of primes closer together than about 250 (Zhang, 2013; Tao, Maynard,\n>
            commentary <and many others, 2014).\n>
            vskip (in comment)
            commentary <This program finds a few small pairs of twins, by the simplest method possible,\n>
            commentary <and should print output like so:\n>
          material code: C
            code line
              source_code <    3 and 5>
                          _ppppppppppp_
            code line
              source_code <    5 and 7>
                          _ppppppppppp_
            code line
              source_code <    11 and 13>
                          _ppppppppppppp_
            code line
              source_code <    ...>
                          _ppppppp_
          material definition
            code line
              defn <define>
              source_code <RANGE 100 >
                          _nnnnnpnnnp_
              commentary < the upper limit to the numbers we will consider> (code)
          material code: C
            code line
              source_code <#include <stdio.h>>
                          _piiiiiiippiiiiipip_
            vskip
            code line
              source_code <int main(int argc, char *argv[]) {>
                          _rrrpffffprrrpiiiipprrrrppiiiippppp_
            code line
              source_code <    for (int i=1; i<RANGE; i++)>
                          _pppprrrpprrrpippppipnnnnnppippp_
            code line
              source_code <        >
                          _pppppppp_
              pmac <Test for twin prime at i>
              source_code <;>
                          _p_
            code line
              source_code <}>
                          _p_
        paragraph P1.1
          material paragraph macro
            code line
              pmac <Test for twin prime at i> (definition)
          material code: C
            code line
              source_code <    if ((>
                          _pppprrppp_
              function usage <isprime>
              source_code <(i)) && (>
                          _pippppppp_
              function usage <isprime>
              source_code <(i+2)))>
                          _pippppp_
            code line
              source_code <        printf("%d and %d\n", i, i+2);>
                          _ppppppppiiiiiipsssssssssssssppippipppp_
          material endnotes
            endnote
              commentary <This code is >
              commentary <used in >
              locale P1'The conjecture'
              commentary <.>
        paragraph P2'Primality'
          material discussion
            commentary <This simple and slow test tries to divide by every whole number at least\n>
            commentary <2 and up to the square root: if none divide exactly, the number is prime.\n>
            commentary <A common error with this algorithm is to check where >
            mathematics <m^2 < n>
            commentary <, rather\n>
            commentary <than >
            mathematics <m^2 \leq n>
            commentary <, thus wrongly considering 4, 9, 25, 49, ... as prime:\n>
            commentary <Cambridge folklore has it that this bug occurred on the first computation\n>
            commentary <of the EDSAC computer on 6 May 1949.\n>
          material definition
            code line
              defn <define>
              source_code <TRUE 1>
                          _nnnnpn_
            code line
              defn <define>
              source_code <FALSE 0>
                          _nnnnnpn_
          material code: C
            code line
              source_code <int >
                          _rrrp_
              function defn <isprime>
                locale P1.1
              source_code <(int n) {>
                          _prrrpippp_
            code line
              source_code <    if (n <= 1) return FALSE;>
                          _pppprrppippppnpprrrrrrpnnnnnp_
            code line
              source_code <    for (int m = 2; m*m <= n; m++)>
                          _pppprrrpprrrpipppnppipippppippippp_
            code line
              source_code <        if (n % m == 0)>
                          _pppppppprrppipppippppnp_
            code line
              source_code <            return FALSE;>
                          _pppppppppppprrrrrrpnnnnnp_
            code line
              source_code <    return TRUE;>
                          _pppprrrrrrpnnnnp_
            code line
              source_code <}>
                          _p_
        section footer <Main>
      chapter footer <Sections>
  tail rennab <End of weave>

This is a "heterogeneous tree", in that its tree_node nodes are annotated by data structures of different types. For example, a node for a section heading is annotated with a weave_section_header_node structure. The necessary types and object constructors are laid tediously out in Weave Tree, a section which intentionally contains no non-trivial code.

§11. Syntax-colouring is worth further mention. Just as the Weaver tries not to get itself into fiddly details of formats, it also avoids specifics of programming languages. It does this by calling LanguageMethods::syntax_colour, which in turn calls the SYNTAX_COLOUR_WEA_MTID method for the relevant instance of programming_language. In effect the weaver sends a snippet of code and asks to be told how it's to be coloured: not in terms of green vs blue, but in terms of IDENTIFIER_COLOUR vs RESERVED_COLOUR and so on.

Thus, the object representing "the C programming language" can in principle choose any semantic colouring that it likes. In practice, if (as is usual) it assigns no particular code to this, what instead happens is that the generic handler function in ACME Support takes on the task.14 This runs the colouring program in the language's definition file. Colouring programs are, in effect, a mini-language of their own, which is compiled by Programming Languages (in foundation) and then run in a low-level interpreter by The Painter (in foundation).

§12. So, then, the weave tree is now made. Just as each programming language has an object representing it, so does each format, and at render time the method call RENDER_FOR_MTID is sent to it. This has to turn the tree into HTML, plain text, TeX source, or whatever may be. It's understood that not every rendering instruction in the weave tree can be fully followed in every format: for example, there's not much that plain text can do to render an image carousel.

Inweb currently contains four renderers:

Renderers should make requests for weave plugins or colour schemes if, and only if, the need arises: for example, the HTML renderer requests the plugin Carousel only if an image carousel is actually called for. Requests are made by calling Swarm::ensure_plugin or Swarm::ensure_colour_scheme, and see also the underlying code at Assets, Plugins and Colour Schemes. (We want our HTML to run as little JavaScript as necessary at load time, which is why we don't just give every weave every possible facility.)

The most complex issue for HTML rendering is working out the URLs for links: for example, when weaving the text you are currently reading, Inweb has to decide where to send text_stream. This is handled by a suite of useful functions in Colonies which coordinate URLs across websites so that one web's weave can safely link to another's. In particular, cross-references written in //this notation// are "resolved" by Colonies::resolve_reference_in_weave, and the function Colonies::reference_URL turns them into relative URLs from any given file. Within the main web being woven, Colonies::paragraph_URL can make a link to any paragraph of your choice.15

§13. Finally on weaving, special mention should go to The Collater, a subsystem which amounts to a stream editor. Its role is to work through a "template" and substitute in material from outside — from the weave rendering, from the bibliographic data for a web, and so on — to produce a final file. For example, a simple use of the collater is to work through the template:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
    <html>
        <head>
            <title>[[Booklet Title]]</title>
            [[Plugins]]
        </head>
        <body>
    [[Weave Content]]
        </body>
    </html>

and to collate material already generated by other parts of Inweb to fill the double-squared placeholders, such as [[Plugins]]. The Collater, in fact, is ultimately what generates all of the files made in a weave, even though other parts of Inweb did all of the real work.

With that said, it's not a trivial algorithm, because it can also loop through chapters and sections, as it does when it generates an index page to accompany a swarm of individual section weaves. The contents pages for typical webs presented online are made this way. The Collater is also recursive, in that some collation commands call for further acts of collation to happen inside the original. See Advanced Weaving with Patterns for more on collation, and see Collater::collate for the machinery.

§14. Tangling mode. Alternatively, we're in TANGLE_MODE, which is more straightforward. Program Control simply works out what we want to tangle, selecting the appropriate tangle_target object, and calls Tangler::tangle. Most webs have just one "tangle target", meaning that the whole web makes a single program — in that case, the choice is obvious. However, the contents section can mark certain chapters or sections as being independent targets.16

Tangler::tangle works hierarchically, calling down to Tangler::tangle_paragraph and finally Tangler::tangle_line on individual lines of code. Throughout the process, the Tangler makes method calls to the current programming language; see Language Methods. As with syntax-colouring, the default arrangement is that these methods are handled by the generic "ACME" language, following instructions from the language definition file.

Languages declaring themselves "C-like" have access to special tangling facilities, all implemented with non-ACME method calls: see C-Like Languages. In particular, for coping with how #ifdef affects #include see CLike::additional_early_matter; for predeclaration of functions and structs and typedefs, see CLike::additional_predeclarations.

The language calling itself "InC" gets even more: see InC Support, and in particular text_literal for text constants like I"banana" and preform_nonterminal for Preform grammar notation like <sentence-ending>.

§15. Analysis mode. Alternatively, we're in ANALYSE_MODE. There's not much to this: Program Control simply calls Analyser::catalogue_the_sections, or else makes use of the same functions as TRANSLATE_MODE would — but in the context of having read in a web. If it makes a .gitignore file, for example, it does so for that specific web, whereas if the same feature is used in TRANSLATE_MODE, it does so in the abstract and for no particular web.

§16. Translation mode. Or, finally, we're in TRANSLATE_MODE. We can:

And that is essentially it. Inweb winds up by returning exit code 1 if there were errors, or 0 if not, like a good Unix citizen.

§17. Adding to Inweb. Here's some miscellaneous advice for those who would like to add to Inweb:

1. To add a new command-line switch, declare at Configuration::read and add a field to inweb_instructions which holds the setting; don't act on it then and there, only in Program Control later. But we don't want these settings to proliferate: ask first if adding a feature to, say, Colonies or weave_pattern files would meet the same need.

2. To add new programming languages, try if possible to do everything you need with a new definition file alone: see Supporting Programming Languages. Failing that, see if making definition files more powerful would do it (for example, by making the ACME support more general-purpose). Failing even that, follow the model of C-Like Languages: that is, add logic to Languages::read_definition which adds method receiver functions to a language with a given name, or, preferably, some given declaration in the language definition file. On no account insert any language bias into The Weaver or The Tangler.

3. To add new forms of weave output, try if possible to make a new pattern: see Advanced Weaving with Patterns. But this won't always be good enough. For example, "an HTML website but done differently" should be a pattern based on HTML, but Markdown would require a genuinely new format. (Though you would still also create a new pattern in order to use it.) If you go down this road, make a new section in Chapter 5: Formats following the model of, say, Plain Text Format and then adding methods gradually. (But don't forget to call your new format's creator function from Formats::create_weave_formats.)

4. As with any program built on Foundation, if you are creating a new class of object, don't forget to declare it in Basics.