Syntax Trees

To parse trees which decompose the meaning of excerpts of text, and which allow annotations to be made at each node.

§1. Optional headings tree.Within inform7 and inbuild, the code in this section is augmented by Headings (in supervisor), which gives every syntax tree an associated tree of source headings; but as far as syntax is concerned, that's all an optional extra.

Code using syntax in this way must define HEADING_TREE_SYNTAX_TYPE as the class of object to which our headings field will point, and callback functions NEW_HEADING_TREE_SYNTAX_CALLBACK and NEW_HEADING_SYNTAX_CALLBACK to initialise and add to it.

default HEADING_TREE_SYNTAX_TYPE void  in which case, never used

§2. Syntax trees.Each parse_node_tree object represents a different syntax tree; typically the entire source text being compiled by inform7, including any extensions it includes, will form a single parse_node_tree.

define MAX_BUD_STACK_SIZE 100  must be at least the number of heading levels plus 3

typedef struct parse_node_tree {
    struct parse_node *root_node;
    int bud_parent_sp;
    struct parse_node *bud_parent_stack[MAX_BUD_STACK_SIZE];
    struct parse_node *last_sentence;  cached position in tree
    int allow_last_sentence_cacheing;
    int contains_dialogue;
    int trace_sentences;
    HEADING_TREE_SYNTAX_TYPE *headings;
    CLASS_DEFINITION
} parse_node_tree;

parse_node_tree *SyntaxTree::new(void) {
    parse_node_tree *T = CREATE(parse_node_tree);
    T->root_node = Node::new(ROOT_NT);
    T->bud_parent_sp = 0;
    T->last_sentence = NULL;
    T->allow_last_sentence_cacheing = FALSE;
    T->trace_sentences = FALSE;
    T->contains_dialogue = FALSE;
    SyntaxTree::push_bud(T, T->root_node);
    #ifdef NEW_HEADING_TREE_SYNTAX_CALLBACK
    T->headings = NEW_HEADING_TREE_SYNTAX_CALLBACK(T);
    #endif
    return T;
}

The structure parse_node_tree is accessed in 2/tv, 3/snt and here.

§3. Buds and grafts."Buds" are positions in the tree to which new sentence subtrees can be grafted:

int SyntaxTree::push_bud(parse_node_tree *T, parse_node *to) {
    int l = T->bud_parent_sp;
    if (T->bud_parent_sp >= MAX_BUD_STACK_SIZE) internal_error("bud stack overflow");
    T->bud_parent_stack[T->bud_parent_sp++] = to;
    return l;
}

void SyntaxTree::pop_bud(parse_node_tree *T, int l) {
    T->bud_parent_sp = l;
}

§4. Sentences are grafted onto the bud position at the top of the stack.

void SyntaxTree::graft_sentence(parse_node_tree *T, parse_node *new) {
    if (T->bud_parent_sp == 0) internal_error("no attachment point");
    if (Node::get_type(new) == HEADING_NT) Adjust bud point for a heading4.1;
    parse_node *sentence_attachment_point = T->bud_parent_stack[T->bud_parent_sp-1];
    SyntaxTree::graft(T, new, sentence_attachment_point);
    if (Node::get_type(new) == HEADING_NT) SyntaxTree::push_bud(T, new);
}

§4.1. When what's attached is a heading node, that changes the stack. The idea here is that sentences graft beneath the heading to which they belong. If the current bud point is a heading called "Section 22", and then we reach the heading sentence "Part 2", this no longer belongs under Section 22, and we pop the bud stack until we're beneath a heading node superior to Parts.

Adjust bud point for a heading4.1 =

    int heading_level = Annotations::read_int(new, heading_level_ANNOT);
    if (heading_level > 0)
        for (int i = T->bud_parent_sp-1; i>=0; i--) {
            parse_node *P = T->bud_parent_stack[i];
            if ((Node::get_type(P) == HEADING_NT) &&
                (Annotations::read_int(P, heading_level_ANNOT) >= heading_level))
                T->bud_parent_sp = i;
        }

This code is used in §4.

§5. Syntax trees for Inform source text have a tendency to be wide. If a source text is basically a list of 5000 sentences, then there may be a node with 5000 children, even though the maximum depth of the tree might be as low as 10.

Because of that we must be wary of algorithms with quadratic running time in the width of the tree, and we get around that with a cache. Certain "tree surgery" operations, when the tree is being rearranged, would throw this, so the optimisation should be enabled only when needed.

void SyntaxTree::enable_last_sentence_cache(parse_node_tree *T) {
    T->last_sentence = NULL;  because this may have changed since last enabled
    T->allow_last_sentence_cacheing = TRUE;
}

void SyntaxTree::disable_last_sentence_cache(parse_node_tree *T) {
    T->allow_last_sentence_cacheing = FALSE;
}

§6. The function SyntaxTree::graft is named by analogy with gardening, where the rootstock of one plant is joined to a scion (or cutting) of another, so that a root chosen for strength can be combined with the fruits or blossom of the scion.

SyntaxTree::graft returns the node for which scion is the immediate sibling, that is, it returns the previously youngest child of the rootstock (or NULL if it previously had no children).

parse_node *SyntaxTree::graft(parse_node_tree *T, parse_node *scion, parse_node *rootstock) {
    parse_node *elder = NULL;
    if (scion == NULL) internal_error("scion is null");
    if (rootstock == NULL) internal_error("rootstock is null");
     is the new node to be the only child of the old?
    if (rootstock->down == NULL) { rootstock->down = scion; return NULL; }
     can last sentence cacheing save us a long search through many children of root?
    if ((rootstock == T->root_node) && (T->allow_last_sentence_cacheing)) {
        if (T->last_sentence) {
            elder = T->last_sentence;
            elder->next = scion;
            T->last_sentence = scion;
            return elder;
        }
         we don't know who's the youngest child now, but we know who soon will be:
        T->last_sentence = scion;
    }
     find youngest child of rootstock...
    for (elder = rootstock->down; elder->next; elder = elder->next) ;
     ...and make the new node its younger sibling
    elder->next = scion; return elder;
}

§7. No speed worries on the much smaller trees with alternative readings:

parse_node *SyntaxTree::graft_alternative(parse_node *scion, parse_node *rootstock) {
    if (scion == NULL) internal_error("scion is null");
    if (rootstock == NULL) internal_error("rootstock is null");
     is the new node to be the only child of the old?
    if (rootstock->down == NULL) { rootstock->down = scion; return NULL; }
     find youngest child of rootstock...
    parse_node *elder = NULL;
    for (elder = rootstock->down; elder->next_alternative; elder = elder->next_alternative) ;
     ...and make the new node its younger sibling
    elder->next_alternative = scion; return elder;
}

§8. And we can loop through these like so:

define LOOP_THROUGH_ALTERNATIVES(p, from)
    for (p = from; p; p = p->next_alternative)

§9. Where we currently are in the text.Inform makes many traverses through the big parse tree, often modifying as it goes, and keeps track of its position so that it can make any problem messages correctly refer to the location of the faulty text in the original source files.

During such traverses, current_sentence is always the subtree being looked at: it is always a child of the tree root, and is usually a SENTENCE_NT node, hence the name.

parse_node *current_sentence = NULL;

§10. General traversals.It's convenient to have a general system for traversing a syntax tree, visiting each node in the connected component of the tree root. C doesn't make this sort of thing easy, though, so there follows a welter of boringly similar functions with convoluted type signatures.

All these do basically the same thing. SyntaxTree::traverse calls a visitor function on each node; SyntaxTree::traverse_from the same, but from a chosen start position, rather than the root of the tree.

In this first version, the visitor function has type signature parse_node * -> void.

void SyntaxTree::traverse(parse_node_tree *T, void (*visitor)(parse_node *)) {
    SyntaxTree::traverse_from(T->root_node, visitor);
}
void SyntaxTree::traverse_from(parse_node *pn, void (*visitor)(parse_node *)) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn);
        }
        if (NodeType::is_top_level(pn->node_type)) SyntaxTree::traverse_from(pn->down, visitor);
    }
    current_sentence = SCS;
}
void SyntaxTree::traverse_run(parse_node *pn, void (*visitor)(parse_node *), node_type_t X) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (Node::get_type(pn) != X) break;
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn);
        }
        if (NodeType::is_top_level(pn->node_type)) SyntaxTree::traverse_from(pn->down, visitor);
    }
    current_sentence = SCS;
}

§11. Note that any node not "visitable" is omitted, where the following function is the sole arbiter. This depends only on its node type.

int SyntaxTree::visitable(node_type_t t) {
    if (NodeType::has_flag(t, DONT_VISIT_NFLAG)) return FALSE;
    return TRUE;
}

§12. And now the same thing, but where the visitor function has the type signature text_stream *, parse_node * -> void.

void SyntaxTree::traverse_text(parse_node_tree *T, text_stream *OUT,
    void (*visitor)(text_stream *, parse_node *)) {
    SyntaxTree::traverse_text_from(OUT, T->root_node, visitor);
}
void SyntaxTree::traverse_text_from(text_stream *OUT, parse_node *pn,
    void (*visitor)(text_stream *, parse_node *)) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(OUT, pn);
        }
        if (NodeType::is_top_level(pn->node_type))
            SyntaxTree::traverse_text_from(OUT, pn->down, visitor);
    }
    current_sentence = SCS;
}

§13. And now the same thing, but where the visitor function has the type signature parse_node *, int * -> void.

void SyntaxTree::traverse_intp(parse_node_tree *T,
    void (*visitor)(parse_node *, int *), int *X) {
    SyntaxTree::traverse_intp_from(T->root_node, visitor, X);
}
void SyntaxTree::traverse_intp_from(parse_node *pn,
    void (*visitor)(parse_node *, int *), int *X) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn, X);
        }
        if (NodeType::is_top_level(pn->node_type))
            SyntaxTree::traverse_intp_from(pn->down, visitor, X);
    }
    current_sentence = SCS;
}

§14. And the same thing, but where the visitor function has the type signature parse_node *, int *, int * -> void.

void SyntaxTree::traverse_intp_intp(parse_node_tree *T,
    void (*visitor)(parse_node *, int *, int *), int *X, int *Y) {
    SyntaxTree::traverse_intp_intp_from(T->root_node, visitor, X, Y);
}
void SyntaxTree::traverse_intp_intp_from(parse_node *pn,
    void (*visitor)(parse_node *, int *, int *), int *X, int *Y) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn, X, Y);
        }
        if (NodeType::is_top_level(pn->node_type))
            SyntaxTree::traverse_intp_intp_from(pn->down, visitor, X, Y);
    }
    current_sentence = SCS;
}

§15. And now for parse_node *, parse_node ** -> void.

void SyntaxTree::traverse_nodep(parse_node_tree *T,
    void (*visitor)(parse_node *, parse_node **), parse_node **X) {
    SyntaxTree::traverse_nodep_from(T->root_node, visitor, X);
}
void SyntaxTree::traverse_nodep_from(parse_node *pn,
    void (*visitor)(parse_node *, parse_node **), parse_node **X) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn, X);
        }
        if (NodeType::is_top_level(pn->node_type))
            SyntaxTree::traverse_nodep_from(pn->down, visitor, X);
    }
    current_sentence = SCS;
}

§16. This is a tricksier sort of traverse, which tells the visitor function the heading node it belongs to. The visitor function now has type signature parse_node_tree *, parse_node *, parse_node *, int * -> void, where the two nodes are the one being visited and its heading, respectively.

void SyntaxTree::traverse_headingwise(parse_node_tree *T,
    void (*visitor)(parse_node_tree *, parse_node *, parse_node *, int *), int *N) {
    SyntaxTree::traverse_headingwise_from(T, T->root_node, visitor, NULL, N);
}
void SyntaxTree::traverse_headingwise_from(parse_node_tree *T, parse_node *pn,
    void (*visitor)(parse_node_tree *, parse_node *, parse_node *, int *),
    parse_node *last_h0, int *N) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(T, pn, last_h0, N);
        }
        if (NodeType::is_top_level(pn->node_type)) {
            parse_node *H0 = last_h0;
            if ((Node::is(pn, HEADING_NT)) &&
                (Annotations::read_int(pn, heading_level_ANNOT) == 0))
                H0 = pn;
            SyntaxTree::traverse_headingwise_from(T, pn->down, visitor, H0, N);
        }
    }
    current_sentence = SCS;
}

§17. And this is another variation: a traverse to find a node with a particular property. The process halts as soon as the visitor function, which has signature parse_node *, parse_node *, parse_node ** -> int, returns TRUE, and the idea is that the visitor will store its result in the parse_node * pointed to by its last argument.

Note that this one doesn't record its position in current_sentence. The fuss over top-level nodes is to ensure recursion even though top-level nodes are not visitable; otherwise the function would never find anything because no visitable nodes would ever be reached.

int SyntaxTree::traverse_to_find(parse_node_tree *T,
    int (*visitor)(parse_node *, parse_node *, parse_node **),
    parse_node **X) {
    return SyntaxTree::traverse_to_find_from(T->root_node, visitor, NULL, X);
}
int SyntaxTree::traverse_to_find_from(parse_node *pn,
    int (*visitor)(parse_node *, parse_node *, parse_node **),
    parse_node *from, parse_node **X) {
    for (; pn; pn = pn->next) {
        if (SyntaxTree::visitable(pn->node_type))
            if ((*visitor)(pn, from, X))
                return TRUE;
        if (NodeType::is_top_level(pn->node_type))
            if (SyntaxTree::traverse_to_find_from(pn->down, visitor, pn, X))
                return TRUE;
    }
    return FALSE;
}

§18. And still another. This one traverses only up to a given stop position, and the visitor has signature parse_node *, void ** -> void, the idea being that the final void ** is a pointer to a general object pointer. This is the sort of thing which brings C into disrepute, but we don't use it very much.

void SyntaxTree::traverse_up_to_ip(parse_node_tree *T, parse_node *end,
    void (*visitor)(parse_node *, void **), void **X) {
    SyntaxTree::traverse_from_up_to_ip(end, T->root_node, visitor, X);
}
int SyntaxTree::traverse_from_up_to_ip(parse_node *end, parse_node *pn,
    void (*visitor)(parse_node *, void **), void **X) {
    parse_node *SCS = current_sentence;
    for (; pn; pn = pn->next) {
        if (pn == end) { current_sentence = SCS; return TRUE; }
        if (SyntaxTree::visitable(pn->node_type)) {
            if (NodeType::is_sentence(pn->node_type)) current_sentence = pn;
            (*visitor)(pn, X);
        }
        if (NodeType::is_top_level(pn->node_type)) {
            if (SyntaxTree::traverse_from_up_to_ip(end, pn->down, visitor, X)) {
                current_sentence = SCS; return TRUE;
            }
        }
    }
    current_sentence = SCS;
    return FALSE;
}

§19. Cautious traverses.When logging or verifying the tree, we cannot use the carefree functions above: the tree might be malformed. As a way to detect cycles, we call for a new "traverse token" — just a unique integer value — and mark all nodes visited with that value.

int pn_log_token = 0;

int SyntaxTree::new_traverse_token(void) {
    return ++pn_log_token;
}

§20. Toggling log output.Various modules conventionally use this global setting to toggle debugging log output:

int SyntaxTree::is_trace_set(parse_node_tree *T) {
    return T->trace_sentences;
}

void SyntaxTree::set_trace(parse_node_tree *T) {
    T->trace_sentences = TRUE;
}

void SyntaxTree::clear_trace(parse_node_tree *T) {
    T->trace_sentences = FALSE;
}

void SyntaxTree::toggle_trace(parse_node_tree *T) {
    T->trace_sentences = (T->trace_sentences)?FALSE:TRUE;
}

§21. Ambiguity subtrees.The following function adds a new reading to an existing interpretation of some wording W, and return the node now representing. For example, suppose the text "orange" can be read as a noun for fruit, a noun for colour, or an adjective, resulting in nodes fruit_node and colour_node and adj_node. Then:

(a) SyntaxTree::add_reading(NULL, fruit_node, W) returns noun_node,
(b) but SyntaxTree::add_reading(fruit_node, colour_node, W) returns this subtree:

    AMBIGUITY_NT A
        fruit_node
        colour_node

    AMBIGUITY_NT A
        fruit_node
        colour_node
        adj_node

Thus it accumulates possible readings of a given text.

A complication is that the following callback function is offered the chance to amend this process in individual cases; it's called whenever reading is about to become one of the alternatives to some existing E. If it returns TRUE, we assume it has done something of its own already, and do nothing further.

(inform7 uses this to rearrange ambiguous phrase invocations to be sorted out in type-checking: see Dash (in values).)

parse_node *SyntaxTree::add_reading(parse_node *existing, parse_node *reading, wording W) {
    if (existing == NULL) return reading;
    if (Node::is(reading, UNKNOWN_NT)) return existing;
    if (Node::is(reading, AMBIGUITY_NT)) reading = reading->down;
    if (Node::is(existing, AMBIGUITY_NT)) {
        #ifdef AMBIGUITY_JOIN_SYNTAX_CALLBACK
        for (parse_node *E = existing->down; E; E = E->next_alternative)
            if (AMBIGUITY_JOIN_SYNTAX_CALLBACK(E, reading))
                return existing;
        #endif
        parse_node *L = existing->down;
        while ((L) && (L->next_alternative)) L = L->next_alternative;
        L->next_alternative = reading;
        return existing;
    }

    #ifdef AMBIGUITY_JOIN_SYNTAX_CALLBACK
    if (AMBIGUITY_JOIN_SYNTAX_CALLBACK(existing, reading)) return existing;
    #endif

    parse_node *A = Node::new_with_words(AMBIGUITY_NT, W);
    A->down = existing;
    A->down->next_alternative = reading;
    return A;
}