Constructing a suitable ctags file for a web.

§1. On every tangle, Inweb writes a simple Universal Ctags file to the root directory of the web: this can then be used by text editors (such as BBEdit on MacOS) to provide code completion and jump-to-definition features when editing the sections in the web.

A ctags file is essentially just a list of identifiers called "tagnames", which are usually names of functions or data types, along with details of where they are defined in a program. Ctags files are almost never written by hand, but are instead generated by a tool such as the eponymous ctags. Here, though, Inweb is going to do the generation, because it can make sense of the web structure of source code in a way which the ctags parser cannot.

The original ctags dates to 1992, and was devised by Ken Arnold. This was much extended as Exuberant Ctags, by Darren Hiebert, which was then forked and re-maintained as Universal Ctags by Reza Jelveh and others. The result is nearly standard now, though as with a lot of early Unix infrastructure (compare make, for example), that standard design feels very antique: white space is significant, filename extensions are not standard practice, and so on. See Universal Ctags for more.1

§2. As mentioned above, Ctags go back to an age before filenames necessarily had extensions, and just as the defaukt make file is makefile and not makefile.mk, so the default Ctags file is called tags and not tags.ctag.

void Ctags::write(web *W, filename *F) {
    text_stream ctags_file;
    pathname *P = NULL;
    if (F) {
        P = Filenames::up(F);
    } else {
        P = W->md->path_to_web;
        F = Filenames::in(P, I"tags");
    }
    text_stream *OUT = &ctags_file;
    if (STREAM_OPEN_TO_FILE(OUT, F, UTF8_ENC) == FALSE)
        Errors::fatal_with_file("unable to write ctags file", F);
    Write header2.1;
    List defined constants2.2;
    List structures2.3;
    List functions2.4;
    STREAM_CLOSE(OUT);
}

§2.1. Unless you really want to monkey with identifiers or filenames containing line break characters or tabs, a ctags file has a simple format to read or write: there's one tag on each line, and each line has three or more fields divided by tab characters. If we write -> for a tab, a line looks like:

    tagname -> filename -> /find/;" -> more

The stranded double-quote there is not a misprint. For example:

    Frogs::spawn -> pond/Chapter 1/Amphibians.w -> /^void Frogs::spawn(species *S) {$/;" -> f

Here the tagname is Frogs::spawn. The filename pond/Chapter 1/Amphibians.w is the file defining this function. The find field is an EX-format command for finding the line in question: see below. Finally, the more field is actually a run of optional extra information, presented in a free-form sort of way, but we will use it only the simplest of ways. In this example it is just f, meaning "I am a function declaration".

The opening lines of the file, however, are usually metadata, i.e., describing the file itself and where it came from. In those lines, tagnames begin with !_ and are called "pseudotags". The filename field is instead a value, while the find field is instead an optional comment.

The first two keys here are essential: the other three seem just to be good practice. These are the five keys which Universal ctags writes by default, so we'll follow suit.

Write header2.1 =

    WRITE("!_TAG_FILE_FORMAT\t2\t/extended format; --format=1 will not append ;\" to lines/\n");
    WRITE("!_TAG_FILE_SORTED\t0\t/0=unsorted, 1=sorted, 2=foldcase/\n");
    WRITE("!_TAG_PROGRAM_AUTHOR\tGraham Nelson\t/graham.nelson@mod-langs.ox.ac.uk/\n");
    WRITE("!_TAG_PROGRAM_NAME\t[[Title]]\t\n");
    WRITE("!_TAG_PROGRAM_VERSION\t[[Semantic Version Number]]\t/built [[Build Date]]/\n");

§2.2. Having prudently opted to give the tags in an unsorted way, we're free to list them in any order convenient to us, and here goes.

The more field d says that a tagname is a defined constant:

List defined constants2.2 =

    defined_constant *str;
    LOOP_OVER(str, defined_constant)
        if (str->at->owning_section->owning_web == W) {
            WRITE("%S\t", str->name);
            Ctags::write_line_ref(OUT, str->at, P);
            WRITE(";\"\t");
            WRITE("d");
            WRITE("\n");
        }

§2.3. The more field t says that a tagname is a type, and we add a clarifying detail to say that it results from a typedef struct. (Note that typeref here, with an "r", is not a mistake. This is what Universal ctags calls it.)

List structures2.3 =

    language_type *str;
    LOOP_OVER(str, language_type)
        if (str->structure_header_at->owning_section->owning_web == W) {
            WRITE("%S\t", str->structure_name);
            Ctags::write_line_ref(OUT, str->structure_header_at, P);
            WRITE(";\"\t");
            WRITE("t\ttyperef:struct:%S", str->structure_name);
            WRITE("\n");
        }

§2.4. The more field f says that a tagname is a function:

List functions2.4 =

    language_function *fn;
    LOOP_OVER(fn, language_function)
        if (fn->function_header_at->owning_section->owning_web == W) {
            WRITE("%S\t", fn->function_name);
            Ctags::write_line_ref(OUT, fn->function_header_at, P);
            WRITE(";\"\t");
            WRITE("f");
            WRITE("\n");
        }

§3. So, then, here we write the filename and find fields for a given source line L in our web. Note that:

void Ctags::write_line_ref(OUTPUT_STREAM, source_line *L, pathname *P) {
    TEMPORARY_TEXT(fn)
    WRITE_TO(fn, "%f", L->owning_section->md->source_file_for_section);
    if (Platform::is_folder_separator(Str::get_first_char(fn)) == FALSE) {
        Str::clear(fn);
        Filenames::to_text_relative(fn, L->owning_section->md->source_file_for_section, P);
    }
    WRITE("%S\t/^", fn);
    DISCARD_TEXT(fn)
    for (int i = 0; i < Str::len(L->text); i++) {
        inchar32_t c = Str::get_at(L->text, i);
        switch (c) {
            case '/': PUT('\\'); PUT(c); break;
            case '^': if (i == 0) PUT('\\'); PUT(c); break;
            case '$': if (i < Str::len(L->text) - 1) PUT('\\'); PUT(c); break;
            default: PUT(c); break;
        }
    }
    WRITE("$/");
}

§4. To make the above work, we need to keep a list of defined constant names. We could laboriously extract that from the hash table of reserved words (see The Analyser), but this is one of those times when life is short and memory is cheap. It's easier to keep a duplicate list.

typedef struct defined_constant {
    struct text_stream *name;
    struct source_line *at;
    CLASS_DEFINITION
} defined_constant;

§5. This is called for any @d or @e constant name, then:

void Ctags::note_defined_constant(source_line *L, text_stream *name) {
    defined_constant *dc = CREATE(defined_constant);
    dc->name = Str::duplicate(name);
    dc->at = L;
}