Constructing a suitable ctags file for a web.
§1. On every tangle, we can also write a simple Universal Ctags file to the root directory of the web: this can then be used by text editors (such as BBEdit on MacOS) to provide code completion and jump-to-definition features when editing the sections in the web.
A ctags file is essentially just a list of identifiers called "tagnames",
which are usually names of functions or data types, along with details of
where they are defined in a program. Ctags files are almost never written by
hand, but are instead generated by a tool such as the eponymous ctags. Here,
though, we are going to do the generation, because we can make sense of the
web structure of source code in a way which the ctags parser cannot.
The original ctags dates to 1992, and was devised by Ken Arnold. This was
much extended as Exuberant Ctags, by Darren Hiebert, which was then forked and
re-maintained as Universal Ctags by Reza Jelveh and others. The result is
nearly standard now, though as with a lot of early Unix infrastructure (compare
make, for example), that standard design feels very antique: white space is
significant, filename extensions are not standard practice, and so on. See
Universal Ctags for more.1
§2. As mentioned above, Ctags go back to an age before filenames necessarily had
extensions, and just as the default make file is makefile and not makefile.mk,
so the default Ctags file is called tags and not tags.ctag.
void Ctags::write(ls_web *W, filename *F) { text_stream ctags_file; pathname *P = NULL; if (F) { P = Filenames::up(F); } else { P = W->path_to_web; F = Filenames::in(P, I"tags"); } text_stream *OUT = &ctags_file; if (STREAM_OPEN_TO_FILE(OUT, F, UTF8_ENC) == FALSE) Errors::fatal_with_file("unable to write ctags file", F); Write header2.1; List defined constants2.2; List structures2.3; List functions2.4; STREAM_CLOSE(OUT); }
§2.1. Unless you really want to monkey with identifiers or filenames containing
line break characters or tabs, a ctags file has a simple format to read or
write: there's one tag on each line, and each line has three or more fields
divided by tab characters. If we write -> for a tab, a line looks like:
tagname -> filename -> /find/;" -> more
The stranded double-quote there is not a misprint. For example:
Frogs::spawn -> pond/Chapter 1/Amphibians.w -> /^void Frogs::spawn(species *S) {$/;" -> f
Here the tagname is Frogs::spawn. The filename pond/Chapter 1/Amphibians.w
is the file defining this function. The find field is an EX-format command for
finding the line in question: see below. Finally, the more field is actually
a run of optional extra information, presented in a free-form sort of way, but
we will use it only the simplest of ways. In this example it is just f,
meaning "I am a function declaration".
The opening lines of the file, however, are usually metadata, i.e., describing the
file itself and where it came from. In those lines, tagnames begin with !_ and are
called "pseudotags". The filename field is instead a value, while the find
field is instead an optional comment.
The first two keys here are essential: the other three seem just to be good practice.
These are the five keys which Universal ctags writes by default, so we'll follow
suit.
Write header2.1 =
WRITE("!_TAG_FILE_FORMAT\t2\t/extended format; --format=1 will not append ;\" to lines/\n"); WRITE("!_TAG_FILE_SORTED\t0\t/0=unsorted, 1=sorted, 2=foldcase/\n"); WRITE("!_TAG_PROGRAM_AUTHOR\tGraham Nelson\t/graham.nelson@mod-langs.ox.ac.uk/\n"); WRITE("!_TAG_PROGRAM_NAME\toutput from tangler command 'Title'\t//\n"); if (Time::fixed()) WRITE("!_TAG_PROGRAM_VERSION\toutput from tangler command 'Version Number'\t/built output from tangler command '28 March 2016'/\n"); else WRITE("!_TAG_PROGRAM_VERSION\toutput from tangler command 'Version Number'\t/built output from tangler command 'Build Date'/\n");
- This code is used in §2.
§2.2. Having prudently opted to give the tags in an unsorted way, we're free to list them in any order convenient to us, and here goes.
The more field d says that a tagname is a defined constant:
List defined constants2.2 =
defined_constant *str; LOOP_OVER_LINKED_LIST(str, defined_constant, CodeAnalysis::defined_constants_list(W)) { WRITE("%S\t", str->name); Ctags::write_line_ref(OUT, str->at, P); WRITE(";\"\t"); WRITE("d"); WRITE("\n"); }
- This code is used in §2.
§2.3. The more field t says that a tagname is a type, and we add a clarifying
detail to say that it results from a typedef struct. (Note that typeref
here, with an "r", is not a mistake. This is what Universal ctags calls it.)
List structures2.3 =
language_type *str; LOOP_OVER_LINKED_LIST(str, language_type, CodeAnalysis::language_types_list(W)) { WRITE("%S\t", str->structure_name); Ctags::write_line_ref(OUT, str->structure_header_at, P); WRITE(";\"\t"); WRITE("t\ttyperef:struct:%S", str->structure_name); WRITE("\n"); }
- This code is used in §2.
List functions2.4 =
language_function *fn; LOOP_OVER_LINKED_LIST(fn, language_function, CodeAnalysis::language_functions_list(W)) { WRITE("%S\t", fn->function_name); Ctags::write_line_ref(OUT, fn->function_header_at, P); WRITE(";\"\t"); WRITE("f"); WRITE("\n"); }
- This code is used in §2.
int Ctags::useful_tags_exist(ls_web *W) { if (LinkedLists::len(CodeAnalysis::defined_constants_list(W)) > 0) return TRUE; if (LinkedLists::len(CodeAnalysis::language_types_list(W)) > 0) return TRUE; if (LinkedLists::len(CodeAnalysis::language_functions_list(W)) > 0) return TRUE; return FALSE; }
§4. So, then, here we write the filename and find fields for a given
source line lst in our web. Note that:
-
The filename must be given relative to the directory containing the tags file, so for us that will be the home directory of the web.
-
The
findfield looks like a regular expression but is not one, despite the suggestive positional markers^and$. Note in particular that round brackets and asterisk characters are not escaped, as they would be in a regex. The Ctags documentation is vague here but does note that^and$should be escaped only where they occur in the first or last positions. Tabs do not need to be escaped.
void Ctags::write_line_ref(OUTPUT_STREAM, ls_line *lst, pathname *P) { ls_section *S = LiterateSource::section_of_line(lst); TEMPORARY_TEXT(fn) WRITE_TO(fn, "%f", S->source_file_for_section); if (Platform::is_folder_separator(Str::get_first_char(fn)) == FALSE) { Str::clear(fn); Filenames::to_text_relative(fn, S->source_file_for_section, P); } WRITE("%S\t/^", fn); DISCARD_TEXT(fn) for (int i = 0; i < Str::len(lst->text); i++) { inchar32_t c = Str::get_at(lst->text, i); switch (c) { case '/': PUT('\\'); PUT(c); break; case '^': if (i == 0) PUT('\\'); PUT(c); break; case '$': if (i < Str::len(lst->text) - 1) PUT('\\'); PUT(c); break; default: PUT(c); break; } } WRITE("$/"); }
§5. To make the above work, we need to keep a list of defined constant names. We could laboriously extract that from the hash table of reserved words, but this is one of those times when life is short and memory is cheap. It's easier to keep a duplicate list.
typedef struct defined_constant { struct text_stream *name; struct ls_line *at; CLASS_DEFINITION } defined_constant;
- The structure defined_constant is accessed in 1/wcl, 1/cln, 1/ws, 1/wcp, 2/wn, 2/hs, 2/wi, 3/pl, 3/tp, 4/tt2, 5/ts, 5/ptt, 5/apacs, 5/wt, 5/hf, 6/rw and here.
void Ctags::note_defined_constant(ls_line *lst, text_stream *name, ls_web *W) { defined_constant *dc = CREATE(defined_constant); dc->name = Str::duplicate(name); dc->at = lst; ADD_TO_LINKED_LIST(dc, defined_constant, CodeAnalysis::defined_constants_list(W)); }