What are holons?

Literate programming is the art of breaking up a complicated program into fragments, explaining how they work, and placing them all in context.

The explanatory parts of a web are called commentary and the fragments of code are called holons, a word borrowed from the Hungarian thinker Arthur Koestler by the Belgian computer scientist Pierre-Arnoul de Marneffe. A holon is "a piece of the whole", and for us it is a piece of the program being expressed by the web.

As we've seen, when Inweb's Markdown notation is used, commentary and holons tend to alternate:

This is commentary text.

And so is this.

	thisIsAHolon = true
	print("A holon is part of the whole.")

Now back to commentary.

In short, the holons, or code fragments, are the runs of lines indented by one tab stop. The web above has just one holon, and it tangles to:

thisIsAHolon = true
print("A holon is part of the whole.")

However, tangling can do more. The real strength of literate programming lies in the ability to divide the code into holons which have names. Consider this:

The program divides neatly:

	print("Begin.")
	{{Phase one}}
	{{Phase two}}
	print("End.")

{{Phase one}} =

	print("This is phase one.")

{{Phase two}} =

	print("This is phase two.")

Now there are three holons. The first is nameless, while the second and third have the names Phase one and Phase two respectively. Now our web tangles to:

print("Begin.")
print("This is phase one.")
print("This is phase two.")
print("End.")

A literate program

Literate programming can only make a case for itself when used to explain something which does actually call for explanation. So the following is a more whole-heartedly literate rewriting of the same counting-sort program used as an example before. This time we divide its critical code into three phases, each of which becomes a named holon.

# Counting Sort by Harold H. Seward

_An implementation of the 1954 sort algorithm._

This algorithm was found in 1954 by [Harold H. Seward](https://en.wikipedia.org/wiki/Harold_H._Seward),
who also devised radix sort. They differ from most sorting techniques because they do
not make comparisons between items in the unsorted array. Indeed, there are no
comparisons in the following function, only iteration.

This function takes an array of non-negative integers, sorts it, and returns
the result. The test `if unsorted` means "if the unsorted array is not empty",
and is needed because Python would otherwise handle this case badly: of course,
if `unsorted` does equal `[]`, then so should `sorted`, and so the right answer
is returned.

	def countingSort(unsorted):
		sorted = []
		if unsorted:
			{{initialise the incidence counts to zero}}
			{{tally how many times each value occurs in the unsorted array}}
			{{construct the sorted array with the right number of each value}}
		return sorted

For example, suppose the array is initially `[4, 2, 2, 6, 3, 3, 1, 6, 5, 2, 3]`.
Then the maximum value is 6. Python arrays index from 0, so we need an incidence
counts array of size 7, and we create it as `[0, 0, 0, 0, 0, 0, 0]`.

{{initialise the incidence counts to zero}} =

	max_val = max(unsorted)
	counts = [0] * (max_val + 1)

In the unsorted array we will observe no 0s, one 1, three 2s, and so on. The
following produces the counts `[0, 1, 3, 3, 1, 1, 2]`.

{{tally how many times each value occurs in the unsorted array}} =

	for value in unsorted:
		counts[value] += 1

Unusually for a sorting algorithm, an entirely new sorted array is created,
using only the incidence counts as a sort of program. We fill `sorted`
with no `0`s, one `1`, three `2`s, and so on, producing `[1, 2, 2, 2, 3, 3, 3, 4, 5, 6, 6]`.[1]

[1] The unsorted array is no longer needed, in fact, and so we could easily make this
algorithm "sort in place" by simply rewriting `unsorted`, rather than making
a new array.

{{construct the sorted array with the right number of each value}} =

	for value, count in enumerate(counts):
		sorted.extend([value] * count)

And this code tests the function:

	A = [4, 2, 2, 6, 3, 3, 1, 6, 5, 2, 3]
	print("Unsorted:", A)
	print("Sorted:", countingSort(A))

So how did we do? What makes count sort interesting is that it is not doomed
to run at $O(n\log n)$ or worse speed, where $n$ is the size of the data set:
counting sort runs at $O(n+k)$, where $k$ is the size of the largest value in
the data (called `max_val` above). With most data, $k$ is either enormous or
at least unpredictable, so that speed and memory usage both spike. But for just
a few applications, where $k$ is known to be within tight bounds, count sort
is still used today and is exceptionally fast.

If literate programming works at all as an idea, then the above ought to be easier to understand (and to spot the flaws in) than the original. The woven form of this, converted for example into a web page, is still easier to read.

Our program is now divided into five holons. Two of these are nameless holons, the first of which (beginning def countingSort) incorporates, or we will say uses, the content of the three named holons. The final nameless holon (beginning A =) does some testing.

The rules about holon names and usage

There are of course some rules about all this, which Inweb enforces. Firstly, about the names:

And there are also common-sense restrictions on how holons are "used", that is, incorporated into other holons:

Naming every holon

Some LP purists may not approve of the way that Inweb normally reads a mix of nameless and named holons. Shouldn't every holon have a name? That was certainly the original scheme of Pierre-Arnoul de Marneffe.

Inweb does not adopt this convention by default because large programs containing a mass of global functions would just become cumbersome if all of those "and here's another one" holons had to have names. But Inweb does recognise that some programs will be better presented the old-fashioned way, so it provides the following feature in order to let authors decide:

Continuations

This ability wasn't needed by the example above, but it's also possible to continue a holon from earlier. Suppose we write the holon:

{{Print diagnostics}} +=

	print("Total memory usage was ", mem_usage)

Note the += here, rather than just =. That makes this a continuation of the holon {{Print diagnostics}}, which must already have been defined. A holon can have any number of continuations; if it does, then the code it represents is a concatenation of its own lines with the lines in its continuations, in order of definition. So for example:

{{Print diagnostics}} =

	print("Diagnostics:")

{{Print diagnostics}} +=

	print("Total time taken was ", time_in_cs)

{{Print diagnostics}} +=

	print("Total memory usage was ", mem_usage)

tangles {{Print diagnostics}} to:

print("Diagnostics:")
print("Total time taken was ", time_in_cs)
print("Total memory usage was ", mem_usage)

Continuations are best used sparingly, and only when they actually clarify what is going on, rather than obscuring it.

Early and late holons

Most programming languages need code to be written in a particular order to avoid errors, and this is occasionally a nuisance to the literate programmer, because that isn't the clearest order in which to explain it.

Inweb therefore allows holons to be marked as to be tangled either earlier or later than usual. For example, the following are all valid header lines:

{{Include header files}} (tangled very early) =

{{Initialisation}} (tangled early) =

{{Do miscellaneous things}} =

{{Closing down}} (tangled late) =

{{Check for undeclared symbols}} (tangled very late) =

Inweb will throw an error if code attempts to use an early or late holon. For example, this cannot work:

{{Do some business}} =
	print("Starting up")
	{{Initialisation}}

because the {{Initialisation}} holon was marked as tangled early. So it can appear only once, and only in the early phase, not as part of another holon.

This is probably a good time to say explicitly what tangling does:

And a holon can be top-level in one of three ways:

Top-level holons are so called because they are exactly the ones which are not used inside any others. Every part of the program is contained, directly or indirectly, in at least one top-level holon. To put this in computer-science terms, our program is a directed multigraph, in which the vertices are the holons, and an edge is drawn from vertex X to vertex Y whenever holon X uses holon Y. This graph contains no directed loops, and the top-level holons are exactly the vertices with no incoming edges.

Webwide holons

In a web with multiple sections, each section ordinarily has its own private set of holon names, as explained above. Early LP tools intended for smaller programs tended instead to have a common set of names used throughout the web, but experience has shown that for larger webs, this is generally not wise.

But it is occasionally useful to make exceptions for particular holons. Inweb therefore allows holons to be marked as webwide, which means that their names are visible from every section. (The difference between a webwide and a regular holon is much like that between a global and a local variable.)

The main holon is automatically webwide if it exists. Otherwise, holons are only webwide if they are explicitly declared that way. For example, suppose we declare a holon like this in one section:

{{Grab bag}} (webwide) =

We could then continue this in any subsequent section of the web, and not just in its own:

{{Grab bag}} +=

It's legal for a holon to be both webwide and early/late: for example,

{{Grab bag}} (webwide and tangled very early) =

Similarly, a webwide holon can be used from outside its own section. We might, for example, declare this in the opening section of a web:

{{Disclaimer}} (webwide) =

	print("This part of the program is not fully implemented yet.")

And then we can use {{Disclaimer}} anywhere needed.

Webwide holons are, again, best used only when there's a good reason for it.

Abbreviating holon names

Suppose a holon has a lengthy name which is difficult to type:

{{Fail with an error message, deallocate memory, and return 1}} =

This may make it annoying to refer to:

	if (text) {{Fail with an error message, deallocate memory, and return 1}}

So Inweb allows such references to be abbreviated:

	if (text) {{Fail...}}

The rule is that the text before ..., in the case Fail, must be a prefix of the name of exactly one holon defined in the current section; or, if it matches none in the current section, exactly one holon defined webwide.