An overview of the inflections module's role and abilities.


§1. Prerequisites. The inflections module is a part of the Inform compiler toolset. It is presented as a literate program or "web". Before diving in:

§2. Inflections. Inflections are modifications of words — usually word endings or beginnings — for different circumstances. English is often called an uninflected language, but this is an exaggeration. For example, we spell the word "tree" as "trees" when we refer to more than one of them. Inform sometimes needs to take text in one form and change it to another — for example, to turn a singular noun into a plural one — and ordinary Preform parsing isn't good enough to express this.

Inform uses a data structure called a "trie" as an efficient way to match prefix and/or suffix patterns in words, and then to modify them.

Tries are provided as basic data structures by Tries and Avinues (in foundation), and the code for initialising them from Preform grammar is provided by Preform Utilities (in words).

§3. Though tries are, as just mentioned, created from Preform grammar, they are parsed quite differently.

In trie grammar, a NT must be either a list of other tries, which are tested in sequence until one matches, or must be a list of inflection rules. These cannot be mixed within the same NT.

§4. In a list of tries, each production consists only of a single nonterminal identifying the trie to make use of. One exception: the token ... before the trie's name makes it work on the end of a word instead of the beginning. For example:

    <fiddle-with-words> ::=
        <fiddle-with-exceptions> |
        ... <fiddle-with-irregular-endings> |
        ... <fiddle-with-regular-endings>

means try <fiddle-with-exceptions> first (on the whole text), then <fiddle-with-irregular-endings> (on the tail), and finally <fiddle-with-regular-endings> (also on the tail).

§5. In a list of inflection rules, each production consists of two tokens. The first token is what to match; the second gives instructions on what to turn it into. An asterisk is used to mean "any string of 0 or more letters"; a digit at the start of the replacement text means "truncate by this many letters and add...". The simplest possible instruction is 0 alone, which means "truncate 0 letters and add nothing", and therefore leaves the text unchanged.

Some examples:

    <pluralise> ::=
        lead 0 |
        codex codices |
        *mouse 5mice

This would pluralise "lead" as "lead", "codex" as "codices", "mouse" as "mice", and "fieldmouse" as "fieldmice".

The special character + after a digit means "double the last letter", so that, for example, 0+er turns "big" to "bigger". In other positions, + means "add another word", so for example 0+er+still turns "big" to "bigger still".

Designing a list of inflection rules is not quite as easy as it looks, because these rules are not applied in succession: it's better to think of the rules as all being performed at once. In general, if you need one inflection rule to take precedence over another, put it in an earlier trie (in the list of tries which includes this one), rather than putting it earlier in the same trie.

For the implementation of these rules, see Tries and Inflections.

§6. Once we have that general inflection machinery, most of what we need to do becomes a simple matter of writing wrapper functions for tries, and these occupy the rest of Chapter 2: Simple Inflections.

§7. Declensions. Declensions are sets of inflected forms of a noun or adjective according to their grammatical case. A language should list its cases in a special nonterminal called <grammatical-case-names>, in which "nominative" or its equivalent should always come first. For example:

<grammatical-case-names> ::=
    nominative | vocative | accusative | dative | genetive | ablative

The function Declensions::no_cases returns a count of these for a given natural language. The actual names of cases are only needed by the function Declensions::writer, which prints out tables of declensions for debugging purposes.

§8. Declensions::of_noun and Declensions::of_article are functions to generate declensions, with one form for each case, from a given stem word. These are done with Preform NTs called <noun-declension> and <article-declension> respectively; these are currently the only two "declension NTs".

The rule for a "declension NT" is that it must provide a list of possibilities in the form either gender table or gender grouper table, where gender is:

In the two-token form gender table, the table is a nonterminal for irregular forms; if the three-token form gender grouper table, the grouper is a nonterminal which works out which "group" the word falls into — groups are numbered, so perhaps, e.g., the word "device" falls into group 1 — and then the table provides declensions for the different groups needed.

§9. A simple example of using the irregular forms table is provided by the English language definition of <article-declension>:

<article-declension> ::=
    *    <en-article-declension>

<en-article-declension> ::=
    a    a    a
         some some |
    the  the  the
         the  the

Here the declension NT is <article-declension> and contains only one possibility, applying to all genders (hence the *). The table of irregular forms is then <en-article-declension>. Each production begins with the possibility against which the stem is matched — here, it's going to have to be "a" or "the". There are then one possibility for each case (nominative and accusative) in each of the two numbers (singular and plural), making four forms in all. English, of course, is not very inflected: this would be more interesting for French:

<article-declension> ::=
    m  <fr-masculine-article-declension> |
    f  <fr-feminine-article-declension>

<fr-masculine-article-declension> ::=
    un   un    un
         des   des |
    le   le    le
         les   les

<fr-feminine-article-declension> ::=
    un   une   une
         des   des |
    le   la    la
         les   les

§10. So much for irregular forms. Grouped forms are useful for languages like German, which has about 12 groups of nouns, each with its own way of declining. For example, there's one group which goes something like:

    Kraft   Kraft   Kraft   Kraft
    Kräfte  Kräfte  Kräften Kräfte

and another which goes like:

    Kamera  Kamera  Kamera  Kamera
    Kameras Kameras Kameras Kameras

For German, we might then have

<noun-declension> ::=
    *  <de-noun-grouper> <de-noun-tables>

<de-noun-grouper> ::=
    kraft   1 |
    kamera  2

<de-noun-tables> ::=
    <de-noun-group1-table> |
    <de-noun-group2-table>

where for example:

<de-noun-group1-table> ::=
    0 | 0 | 0 | 0 |
    3äfte | 3äfte | 3äften | 3äfte

giving inflection rules for the four cases of German in singular and then in plural. In practice, of course, <de-noun-grouper> will need to sort out nouns rather better than this, and there are about 12 groups. Groups are numbered upwards from 1 to, in principle, 99. See Declensions::decline_from_groups.

§11. Verb conjugations. This module supplies an extensive system for conjugating verbs. A full set of inflected forms for a verb, in all its tenses, voices and so on, is stored in a verb_conjugation object. Making these objects is a nontrivial task: see the function Conjugation::conjugate.

Like declensions, verb conjugations rely on a set of tables in special formats, but which are stored in nonterminals of Preform grammar. There is a full description of the syntax used in these tables in the section English Inflections, which demonstrates a complete conjugation of English verbs.

§12. Naming conventions. Tries are highly language specific, and would need rewriting for every language. The tries for English are supplied in English Inflections, but that's just for convenience; other languages should supply them in the Inform source text of the relevant language extension, or in Syntax.preform files.

Except at the very top level, translators are free to created new tries and name them as they please, but the top-level tries must have the same names that they have here. For example, the Spanish implementation of

    <singular-noun-to-its-indefinite-article>

may look entirely unlike its English version, but at the top level it still has to have that name.

All lower-level tries used in the implementation should have names beginning with a language code: hence the names "en-" used in English Inflections. There doesn't need to be any direct Spanish equivalent to <en-trie-plural-assimilated-classical-inflections>, for example.