Some basic linguistic constants are defined.
§1. Constants. First, we support three genders:
define NO_KNOWN_GENDERS 3 define NEUTER_GENDER 1 or can be used as Scandinavian "common gender" define MASCULINE_GENDER 2 define FEMININE_GENDER 3
§2. There are six "persons". The sequence corresponds to the defined constants in the English Language extension, which we assume will be followed by other languages.
define NO_KNOWN_PERSONS 3 define FIRST_PERSON 0 define SECOND_PERSON 1 define THIRD_PERSON 2
define NO_KNOWN_NUMBERS 2 define SINGULAR_NUMBER 0 define PLURAL_NUMBER 1
define NO_KNOWN_VOICES 2 define ACTIVE_VOICE 0 define PASSIVE_VOICE 1
define NO_KNOWN_SENSES 2 define POSITIVE_SENSE 0 define NEGATIVE_SENSE 1
§6. 25 cases sounds like plenty, but some languages are pretty scary this way: Hungarian has 18. We only require two cases to exist, the nominative and accusative, which are required to be cases 0 and 1.
define MAX_GRAMMATICAL_CASES 25 define NOMINATIVE_CASE 0 define ACCUSATIVE_CASE 1
§7. There are at least five tenses, the first four of which are used by Inform in English. Some languages can use optional extras; French, for example, uses tense 5 for the past historic.
define NO_KNOWN_TENSES 7 define IS_TENSE 0 Present define WAS_TENSE 1 Past define HASBEEN_TENSE 2 Present perfect define HADBEEN_TENSE 3 Past perfect define WILLBE_TENSE 4 Future (not used in assertions or conditions) define CUSTOM1_TENSE 5 define CUSTOM2_TENSE 6
§8. Packed references. The following enables even a 32-bit integer to hold an ID reference in the range 0 to 128K, together with any combination of gender, person, number, voice, case, tense, and sense. This could be optimised further, exploiting for example that no grammatical concept ever simultaneously has voice and gender, but it seems unlikely that there's any need.
If the 128K limit on references ever becomes problematic, which seems very unlikely, we might compromise on the number of cases; or we might simply change lcon_ti to a wider integer type. (It needs to have value copy semantics.) If so, though, Preform results will also need to be widened, because numerous Preform nonterminals in linguistics return lcon_ti values, and at present Preform return values are int.
define lcon_ti int
§9. And here's how we pack everything in:
<-- lsb 32 bits msb --> gender xx.............................. person ..xx............................ number ....x........................... voice .....x.......................... case ......xxxxx..................... tense ...........xxx.................. sense ..............x................. id ...............xxxxxxxxxxxxxxxxx
define GENDER_LCBASE 0x00000001 define GENDER_LCMASK 0x00000003 define PERSON_LCBASE 0x00000004 define PERSON_LCMASK 0x0000000C define NUMBER_LCBASE 0x00000010 define NUMBER_LCMASK 0x00000010 define VOICE_LCBASE 0x00000020 define VOICE_LCMASK 0x00000020 define CASE_LCBASE 0x00000040 define CASE_LCMASK 0x000007C0 define TENSE_LCBASE 0x00000800 define TENSE_LCMASK 0x00003800 define SENSE_LCBASE 0x00004000 define SENSE_LCMASK 0x00004000 define ID_LCBASE 0x00008000 define ID_LCUNMASK 0x00007FFF
lcon_ti Lcon::base(void) { return (lcon_ti) 0; } lcon_ti Lcon::of_id(int id) { return (lcon_ti) id*ID_LCBASE; } int Lcon::get_id(lcon_ti l) { return (int) l/ID_LCBASE; } int Lcon::get_gender(lcon_ti l) { return (int) (l & GENDER_LCMASK) / GENDER_LCBASE; } int Lcon::get_person(lcon_ti l) { return (int) (l & PERSON_LCMASK) / PERSON_LCBASE; } int Lcon::get_number(lcon_ti l) { return (int) (l & NUMBER_LCMASK) / NUMBER_LCBASE; } int Lcon::get_voice(lcon_ti l) { return (int) (l & VOICE_LCMASK) / VOICE_LCBASE; } int Lcon::get_case(lcon_ti l) { return (int) (l & CASE_LCMASK) / CASE_LCBASE; } int Lcon::get_tense(lcon_ti l) { return (int) (l & TENSE_LCMASK) / TENSE_LCBASE; } int Lcon::get_sense(lcon_ti l) { return (int) (l & SENSE_LCMASK) / SENSE_LCBASE; } lcon_ti Lcon::set_id(lcon_ti l, int id) { return (l & ID_LCUNMASK) + id*ID_LCBASE; } lcon_ti Lcon::set_gender(lcon_ti l, int x) { return (l & (~GENDER_LCMASK)) + x*GENDER_LCBASE; } lcon_ti Lcon::set_person(lcon_ti l, int x) { return (l & (~PERSON_LCMASK)) + x*PERSON_LCBASE; } lcon_ti Lcon::set_number(lcon_ti l, int x) { return (l & (~NUMBER_LCMASK)) + x*NUMBER_LCBASE; } lcon_ti Lcon::set_voice(lcon_ti l, int x) { return (l & (~VOICE_LCMASK)) + x*VOICE_LCBASE; } lcon_ti Lcon::set_case(lcon_ti l, int x) { return (l & (~CASE_LCMASK)) + x*CASE_LCBASE; } lcon_ti Lcon::set_tense(lcon_ti l, int x) { return (l & (~TENSE_LCMASK)) + x*TENSE_LCBASE; } lcon_ti Lcon::set_sense(lcon_ti l, int x) { return (l & (~SENSE_LCMASK)) + x*SENSE_LCBASE; } void Lcon::write_person(OUTPUT_STREAM, int p) { switch (p) { case FIRST_PERSON: WRITE("1p"); break; case SECOND_PERSON: WRITE("2p"); break; case THIRD_PERSON: WRITE("3p"); break; } } void Lcon::write_number(OUTPUT_STREAM, int n) { switch (n) { case SINGULAR_NUMBER: WRITE("s"); break; case PLURAL_NUMBER: WRITE("p"); break; } } void Lcon::write_gender(OUTPUT_STREAM, int g) { switch (g) { case NEUTER_GENDER: WRITE("n"); break; case MASCULINE_GENDER: WRITE("m"); break; case FEMININE_GENDER: WRITE("f"); break; } } void Lcon::write_sense(OUTPUT_STREAM, int s) { if (s == NEGATIVE_SENSE) WRITE("-ve"); if (s == POSITIVE_SENSE) WRITE("+ve"); } void Lcon::write_voice(OUTPUT_STREAM, int m) { if (m == ACTIVE_VOICE) WRITE("act"); if (m == PASSIVE_VOICE) WRITE("pass"); } void Lcon::write_tense(OUTPUT_STREAM, int t) { switch (t) { case IS_TENSE: WRITE("IS_TENSE"); break; case WAS_TENSE: WRITE("WAS_TENSE"); break; case HASBEEN_TENSE: WRITE("HASBEEN_TENSE"); break; case HADBEEN_TENSE: WRITE("HADBEEN_TENSE"); break; case WILLBE_TENSE: WRITE("WILLBE_TENSE"); break; case CUSTOM1_TENSE: WRITE("CUSTOM1_TENSE"); break; case CUSTOM2_TENSE: WRITE("CUSTOM2_TENSE"); break; default: WRITE("<invalid-tense>"); break; } } void Lcon::write_case(OUTPUT_STREAM, int c) { switch (c) { case NOMINATIVE_CASE: WRITE("nom"); break; case ACCUSATIVE_CASE: WRITE("acc"); break; default: WRITE("case%d", c); break; } }
int Lcon::same_but_for_gender(lcon_ti A, lcon_ti B) { if ((A - (A & GENDER_LCMASK)) == (B - (B & GENDER_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_person(lcon_ti A, lcon_ti B) { if ((A - (A & PERSON_LCMASK)) == (B - (B & PERSON_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_number(lcon_ti A, lcon_ti B) { if ((A - (A & NUMBER_LCMASK)) == (B - (B & NUMBER_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_voice(lcon_ti A, lcon_ti B) { if ((A - (A & VOICE_LCMASK)) == (B - (B & VOICE_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_case(lcon_ti A, lcon_ti B) { if ((A - (A & CASE_LCMASK)) == (B - (B & CASE_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_tense(lcon_ti A, lcon_ti B) { if ((A - (A & TENSE_LCMASK)) == (B - (B & TENSE_LCMASK))) return TRUE; return FALSE; } int Lcon::same_but_for_sense(lcon_ti A, lcon_ti B) { if ((A - (A & SENSE_LCMASK)) == (B - (B & SENSE_LCMASK))) return TRUE; return FALSE; }
§11. Axes. We can think of a combination of the seven grammatical attributes above as being like a position in seven-dimensional space, with each being a coordinate on one of these seven axes.
In practice, we're often interested in only a few of the seven. Nouns, for instance, do not have tenses; verbs do not have cases. It's convenient to represent the seven axes by the following constants, so that an arbitrary sum of these can represent a set of things we're interested in:
define GENDER_LCW 1 define PERSON_LCW 2 define NUMBER_LCW 4 define VOICE_LCW 8 define CASE_LCW 16 define TENSE_LCW 32 define SENSE_LCW 64
§12. And desiderata in the following function is exactly that sort of set.
void Lcon::write(OUTPUT_STREAM, lcon_ti l, int desiderata) { for (int axis=1; axis<128; axis=axis*2) if (desiderata & axis) { WRITE(" "); Lcon::write_value_on_axis(OUT, axis, Lcon::get_value_on_axis(axis, l)); } }
§13. The parameter axis in the following must, on the other hand, be a pure power of 2, that is, it must be a single *_LCW value.
void Lcon::write_value_on_axis(OUTPUT_STREAM, int axis, int v) { switch (axis) { case GENDER_LCW: Lcon::write_gender(OUT, v); break; case PERSON_LCW: Lcon::write_person(OUT, v); break; case NUMBER_LCW: Lcon::write_number(OUT, v); break; case VOICE_LCW: Lcon::write_voice(OUT, v); break; case CASE_LCW: Lcon::write_case(OUT, v); break; case TENSE_LCW: Lcon::write_tense(OUT, v); break; case SENSE_LCW: Lcon::write_sense(OUT, v); break; default: internal_error("bad axis"); } } int Lcon::get_value_on_axis(int axis, lcon_ti A) { switch (axis) { case GENDER_LCW: return Lcon::get_gender(A); case PERSON_LCW: return Lcon::get_person(A); case NUMBER_LCW: return Lcon::get_number(A); case VOICE_LCW: return Lcon::get_voice(A); case CASE_LCW: return Lcon::get_case(A); case TENSE_LCW: return Lcon::get_tense(A); case SENSE_LCW: return Lcon::get_sense(A); default: internal_error("bad axis"); } return 0; } int Lcon::same_but_for_value_on_axis(int axis, lcon_ti A, lcon_ti B) { switch (axis) { case GENDER_LCW: return Lcon::same_but_for_gender(A, B); case PERSON_LCW: return Lcon::same_but_for_person(A, B); case NUMBER_LCW: return Lcon::same_but_for_number(A, B); case VOICE_LCW: return Lcon::same_but_for_voice(A, B); case CASE_LCW: return Lcon::same_but_for_case(A, B); case TENSE_LCW: return Lcon::same_but_for_tense(A, B); case SENSE_LCW: return Lcon::same_but_for_sense(A, B); default: internal_error("bad axis"); } return 0; }
§14. Writing sets. Suppose we have a list of lcon_ti constants and want to print out their grammatical attributes. If we do that in the obvious way, by calling Lcon::write on each of the constants in turn, we tend to get a list of tiresome length. We want to abbreviate so that, e.g.,
1p s + 1p p + 2p s + 2p p + 3p s + 3p p
becomes just 1p/2p/3p s/p.
Doing this is surprisingly non-trivial: an optimal solution means finding the minimal number of disjoint 7-dimensional cuboids whose union is the set of coordinates in the list. "Cuboid" here really means "Cartesian product of seven sets"; the above case is a benign one because the set in question is a single cuboid — $$ \lbrace (1p, s), (2p, s), (3p, s), (1p, p), (2p, p), (3p, p) \rbrace = \lbrace 1p, 2p, 3p \rbrace\times\lbrace s, p\rbrace. $$
We will aim for an adequately good answer, not an optimal one. The following code is really only needed for printing tidy debugging and test logs, so it's probably not worth any further effort.
§15. To avoid the C extension for variable-length arrays, and to avoid memory allocation, we're simply going to make our working arrays quite large. But this is fine — the function is for printing, so it's not used much.
define MAX_LCON_SET_SIZE NO_KNOWN_GENDERS*NO_KNOWN_PERSONS*NO_KNOWN_NUMBERS*NO_KNOWN_VOICES* NO_KNOWN_SENSES*MAX_GRAMMATICAL_CASES*NO_KNOWN_TENSES
§16. We are going to aggregate items in the list into numbered cuboids. The strategy is simple: start with the first item; make the largest-volume cuboid inside our set which contains that item; then take the next item not already included, and continue.
void Lcon::write_set(OUTPUT_STREAM, lcon_ti *set, int set_size, int desiderata) { if (set_size > MAX_LCON_SET_SIZE) internal_error("lcon set too large"); int cuboid_number[MAX_LCON_SET_SIZE]; for (int i=0; i<set_size; i++) cuboid_number[i] = -1; for (int i=0, cuboid=0; i<set_size; i++) if (cuboid_number[i] == -1) { if (cuboid++ > 0) WRITE(" +"); Find the most volumetric cuboid containing this form16.1; } }
§16.1. Note that there is always at least one cuboid containing the item \(i\) — the \(1\times 1\times 1\times 1\times 1\times 1\times 1\) cuboid containing just that one point. So the following certainly finds something. The elongated_sides value accumulates the set of axis directions in which the cuboid is longer than 1.
Find the most volumetric cuboid containing this form16.1 =
cuboid_number[i] = cuboid; int elongated_sides = 0; Repeatedly elongate in the axis which maximises the volume growth16.1.1; Write the resulting cuboid out16.1.2;
- This code is used in §16.
§16.1.1. So now we are at item \(i\). We repeatedly do the following: try to expand the cuboid into each of the seven axis directions, then choose the one which expands it the most. We stop when no further expansion is possible.
Repeatedly elongate in the axis which maximises the volume growth16.1.1 =
int max_elongation = 0; do { int best_d = 0; max_elongation = 0; int enlarged[MAX_LCON_SET_SIZE]; for (int d = 1; d < 128; d = d*2) if (d & desiderata) { int elongation = 0; Enlarge the cuboid in axis direction d16.1.1.1; if (max_elongation < elongation) { max_elongation = elongation; best_d = d; } } if (best_d) { elongated_sides = elongated_sides | best_d; int d = best_d, elongation = 0; Enlarge the cuboid in axis direction d16.1.1.1; for (int j=0; j<set_size; j++) cuboid_number[j] = enlarged[j]; } } while (max_elongation > 0);
- This code is used in §16.1.
§16.1.1.1. We start with the current cuboid. The enlarged array will be the same as the cuboid_number array except that some additional points x for which cuboid_number[x] is \(-1\) — i.e., points not yet placed in any cuboid — will have enlarged[x] set to cuboid — i.e., will be placed in the current cuboid. In effect, enlarged is a speculative next version of cuboid_number.
We first find the "variations" in the \(d\) direction: that is, \(d\) coordinates of points which are either \(i\) itself or are unplaced points whose other coordinates are the same as those for \(i\).
Enlarge the cuboid in axis direction d16.1.1.1 =
for (int j=0; j<set_size; j++) enlarged[j] = cuboid_number[j]; int variations[MAX_LCON_SET_SIZE], no_vars; Find all the variations on axis d from position i16.1.1.1.1; int allow = TRUE; Check every position has the same variations, and elongate by them16.1.1.1.2; if (allow == FALSE) elongation = 0;
- This code is used in §16.1.1 (twice).
§16.1.1.1.1. For example, if \(i = (2, 1, 0, 0, 0, 0, 0)\) and \(d\) is the second axis, then one variation would be 1 (the \(d\) coordinate of \(i\) itself) and if, say, \((2, 7, 0, 0, 0, 0, 0)\) were an unplaced point then 7 would also be a variation.
Find all the variations on axis d from position i16.1.1.1.1 =
no_vars = 0; for (int j=0; j<set_size; j++) if ((cuboid_number[j] < 0) || (j == i)) { lcon_ti A = set[i], B = set[j]; if (Lcon::same_but_for_value_on_axis(d, A, B)) variations[no_vars++] = Lcon::get_value_on_axis(d, B); }
- This code is used in §16.1.1.1.
§16.1.1.1.2. Now suppose our variation set is indeed \(\lbrace 1, 7\rbrace\), as in the above example. The idea is that we will use this set as the new side for the cuboid. We know that we can vary \(i\) by these values; that's how they were found. But we must also check that we can vary every other point currently in the cuboid in the same way. If we can't, the attempt fails.
Check every position has the same variations, and elongate by them16.1.1.1.2 =
for (int k=0; k<set_size; k++) if (cuboid_number[k] == cuboid) { for (int vc=0; vc<no_vars; vc++) { int v = variations[vc], found = FALSE; for (int j=0; j<set_size; j++) if ((cuboid_number[j] < 0) || (j == k)) { lcon_ti A = set[k], B = set[j]; if ((Lcon::same_but_for_value_on_axis(d, A, B)) && (v == Lcon::get_value_on_axis(d, B))) { if (enlarged[j] == -1) { enlarged[j] = cuboid; elongation++; } found = TRUE; } } if (found == FALSE) allow = FALSE; } }
- This code is used in §16.1.1.1.
§16.1.2. And finally, but also not quite trivially, printing out the cuboid. We handle the elongated sides differently from the unelongated ones, which are relegated to the Lcon::write call at the end. Note that this prints nothing if remainder is zero.
Write the resulting cuboid out16.1.2 =
int unelongated_sides = desiderata; for (int d=1; d<128; d=d*2) { if (elongated_sides & d) { unelongated_sides = unelongated_sides - d; WRITE(" "); int values[MAX_LCON_SET_SIZE]; for (int j=0, vc=0, terms=0; j<set_size; j++) if (cuboid_number[j] == cuboid) { int v = Lcon::get_value_on_axis(d, set[j]); int already_listed = FALSE; for (int x=0; x<vc; x++) if (v == values[x]) already_listed = TRUE; if (already_listed == FALSE) { if (terms++ > 0) WRITE("/"); Lcon::write_value_on_axis(OUT, d, v); values[vc++] = v; } } } } Lcon::write(OUT, set[i], unelongated_sides);
- This code is used in §16.1.