Inform 7 Home Page / Documentation
§5.11. Unicode characters
As we have seen, Inform allows us to type a wide range of characters into the source text, although the more exotic ones may only appear inside quotation marks. But they become more and more difficult to type as they become more obscure. Inform therefore allows us to describe a letter using a text substitution rather than typing it directly.
Unicode characters can be named (or numbered) directly in text. For example:
"[unicode 321]odz Churchyard"
produces a Polish slashed L. If the Unicode Character Names or Unicode Full Character Names extensions are included, characters can also be named as well as numbered:
"[unicode Latin capital letter L with stroke]odz Churchyard"
The Unicode standard assigns character numbers to essentially every marking used in text from any human language: its full range is enormous. (Note that Inform writes these numbers in decimal: many reference charts show them in hexadecimal, or base 16, which can cause confusion.) Inform can only handle codes [unicode 32] up to [unicode 65535], so it is not quite so catholic, but the range is still enormous enough that code numbers are unfamiliar to the eye. Inform therefore allows us to use the official Unicode 4.1 names for characters, instead of their decimal numbers, provided we have Included the necessary extension like so:
Include Unicode Character Names by Graham Nelson.
This extension provides names for some 2900 of the most commonly used characters. It means, for instance, that we can write text such as:
"Dr Zarkov unveils the new [unicode Hebrew letter alef] Nought drive."
"Omar plays 4[unicode black spade suit] with an air of triumph."
Admittedly, these can get a little verbose:
"[unicode Greek small letter omega with psili and perispomeni and ypogegrammeni]"
But before getting carried away, we should remember the hazards: Inform allows us to type, say, "[unicode Saturn]" (an astrological sign) but it appears only as a black square if the resulting story is played by an interpreter using a font which lacks the relevant sign. For instance, Zoom for OS X uses the Lucida Grande and Apple Symbol fonts by default, and this combination does contain the Saturn sign: but Windows Frotz tends to use the Tahoma font by default, which does not. (Another issue is that the fixed letter spacing font, such as used in the status line, may not contain all the characters that the font of the main text contains.) To write something with truly outré characters is therefore a little chancy: users would have to be told quite carefully what interpreter and font to use to play it.
The "Unicode Character Names" extension, which is pre-installed in the standard distribution of Inform, defines names for the Latin, Greek, Cyrillic, Hebrew and Braille alphabets, together with currency and miscellaneous other symbols, including some for drawing boxes and arrows. It is only optionally installed because even this is quite large: but in case it should still prove inadequate, an alternative can be used:
Include Unicode Full Character Names by Graham Nelson.
This includes all 12,997 named characters in the 16-bit range of the Unicode 4.1 standard: it is the size of a small novel and its inclusion will slow Inform down. But if you want to experiment with Arabic, ecclesiastical Georgian, Cherokee, Tibetan, Syriac, the International Phonetic Alphabet, hexagrams or the unified Canadian aboriginal syllabics, "Unicode Full Character Names" (again built into Inform) is the extension for you.
The Über-complète clavier
The following example puts Inform's support for exotic lettering through its paces. It was useful in testing Inform but is not a very instructive read: still, it does provide a test story file for interpreters, so we are including the source here as an example.
The story headline is "Pushing the Limits of Unicode in IF". The story description is "This is a demanding test for Unicode compliance by Z-machine interpreters."
The Château Bibliothèque Français is east of the Deutsche Universität Bücherei. "From this Borgesian construction, doorways lead into anterooms in each of the four cardinal directions." South of the Bibliothèque is the Miscellany Mañana. North of the Bibliothèque is the Íslendingabók. East of the Bibliothèque is Alphabet Soup.
A framed photograph of Icelandic Prime Minister Halldór Ásgrímsson, a ruler measuring Ångströms, a Bokmål-Lëtzebuergesch Lëtzebuergesch-Bokmål dictionary and a ticket to Tromsø via Østfold are in the Íslendingabók.
A paper by Karl Weierstraß, a general feeling of Ärger, an old Österreich passport and the Bach Clavier-Übung open at the fugue à 4 are in the Bücherei.
The painting of École normale superiéure students singing Ça ira, the frankly lesser-known journal of Niccolò Polo, Così fan tutte on CD, an extract of Herodotus concerning Artaÿctes and the exit sign reading À BIENTÔT are in the Bibliothèque.
A wicker basket marked CHLOË is in the Bibliothèque. A ginger cat is in the basket.
A guide to Æsop for naïve æsthetes, Lönnrot's Kalevala, a creed according to the Bahá'ís, FALARÃO magazine, an Estonian poem by Tõnu Trubetsky, a Portuguese-Italian recipe for macarrão, a stripy hanging CANDY PIÑATA bag, a ¿¡Punctuation Turned Upside Down¿¡ pamphlet, an Italian brewers' anti-violence poster declaring BÓTTE NON BÒTTE, a map of È and a dusty book titled The Parnasum of Luís Vaz bearing CAMÕES on its spine are in Miscellany Mañana.
The description of the map is "È is a province in the People's Republic of China."
In Mañana is something called ÂÊÎÔÛ - The Official Journal of the Society for Vowels bearing Circumflexes.
In Mañana is something called âêîôû comic - the youth edition.
The description of Alphabet Soup is "A bewildering place of glyphs, sigils and signs. The Library proper leads back west: steps lead upwards to an Observatory, or downwards into what seems to be a dangerous area. A gaming lounge lies to the south."
The Greek Alphabet, the Cyrillic Alphabet, the Hebrew alphabet, and the embossed plaque are in Alphabet Soup. The description of the Greek alphabet is "αβγδεζηθικλμνξοπρςστυφχψω.". The description of the Hebrew alphabet is "אבגדהוזחטיךכלםמןנסעףפץצקרשת.". The description of the Cyrillic alphabet is "абвгдежзийклмнопрстуфхцчшщъыьэюя.".
Instead of examining the plaque:
say "It seems to be a sign in Braille: ";
say unicode Braille pattern dots-24, " (I), ",
unicode Braille pattern dots-1345, " (N), ",
unicode Braille pattern dots-124, " (F), ",
unicode Braille pattern dots-135, " (O), ",
unicode Braille pattern dots-1235, " (R), ",
unicode Braille pattern dots-134, " (M)."
The Gaming Lounge is south of Alphabet Soup. The chess position and the book of puzzle canons are in the Gaming Lounge.
The Georges de la Tour painting Le Tricheur is in the Gaming Lounge. "Hanging on one wall is Georges de la Tour's masterpiece Le Tricheur (the card-sharp). Visible are 8[unicode black diamond suit], 9[unicode black diamond suit], A[unicode black diamond suit], A[unicode black spade suit], 6[unicode black club suit] but not one of them has a [unicode black heart suit]."
The description of Le Tricheur is "If they'd been dice-players instead, they might have thrown [unicode die face-1], [unicode die face-2], [unicode die face-3], [unicode die face-4], [unicode die face-5] or [unicode die face-6], but as it is they stick to cards."
The description of the book of canons is "A typical fugue is no. 13 (Tovey: [unicode eighth note] = 110) in F[unicode music sharp sign] minor, but you can also make out keys like A[unicode music flat sign] and G[unicode music natural sign]."
The empty square text is text that varies. To say empty: say the empty square text.
To display the board:
say empty, empty, empty, empty, empty, empty, unicode black chess king, empty, line break;
say empty, empty, empty, unicode black chess queen, empty, empty, unicode black chess pawn, empty, line break;
say unicode black chess pawn, empty, empty, unicode black chess bishop, unicode black chess pawn, empty, empty, unicode black chess pawn, line break;
say empty, empty, empty, unicode black chess pawn, empty, unicode black chess rook, empty, empty, line break;
say empty, unicode black chess pawn, empty, unicode white chess pawn, unicode black chess pawn, empty, empty, empty, line break;
say empty, empty, empty, unicode black chess bishop, unicode white chess queen, empty, unicode white chess pawn, unicode white chess pawn, line break;
say unicode white chess pawn, unicode white chess pawn, empty, unicode white chess bishop, empty, unicode black chess rook, unicode white chess bishop, empty, line break;
say empty, unicode white chess knight, empty, empty, unicode white chess rook, empty, unicode white chess rook, unicode white chess king, line break.
Instead of examining the chess position:
say "Fritz Saemisch - Aron Nimzowitsch, Copenhagen 1923: the Immortal Zugzwang Game. Nimzowitsch (black), observing that white will very soon have to play a terrible move, has just advanced his h pawn for no reason other than to wait. So it is white to play...";
say "[fixed letter spacing]......k. [line break]...q..p. [line break]p..bp..p [line break]...p.r.. [line break].p.Pp... [line break]...bQ.PP [line break]PP.B.rB. [line break].N..R.RK [variable letter spacing][line break]";
say "'White must, willy-nilly, eventually throw himself upon the sword', in Nimzowitsch's commentary. ";
say "We will now try to display the same position using chess-piece symbols in a Unicode font.";
say fixed letter spacing;
now the empty square text is " ";
display the board;
say variable letter spacing.
The planets are in the Observatory. "Diagrams of the planets are scattered across the dome: Sun [unicode Sun], Mercury [unicode Mercury], Venus [unicode Female Sign], Earth [unicode Earth], Moon [unicode First Quarter Moon] and [unicode Last Quarter Moon], Mars [unicode Male Sign], Jupiter [unicode Jupiter], Saturn [unicode Saturn], Uranus [unicode Uranus], Neptune [unicode Neptune], Pluto [unicode Pluto] and one or two comets [unicode Comet]. Fainter, but all around, you see stars black [unicode black star] and white [unicode white star]."
The constellations are in the Observatory. "Ringing the dome are the constellations: Aries [unicode Aries], Taurus [unicode Taurus], Gemini [unicode Gemini], Cancer [unicode Cancer], Leo [unicode Leo], Virgo [unicode Virgo], Libra [unicode Libra], Scorpius [unicode Scorpius], Sagittarius [unicode Sagittarius], Capricorn [unicode Capricorn], Aquarius [unicode Aquarius], Pisces [unicode Pisces]."
The weather almanac is in the Observatory. The description of the almanac is "Here nightly observers scrawl in hasty abbreviations for the current weather conditions: clear weather [unicode Black Sun with Rays], cloudy [unicode cloud], rain [unicode umbrella], snow [unicode snowman], lightning [unicode lightning], thunderstorm [unicode thunderstorm]."
The Danger Zone is below Alphabet Soup. The printed name of the Danger Zone is "[unicode skull and crossbones] Danger Zone [unicode skull and crossbones]".
The warning signs are in the Danger Zone. "A variety of international-standard warning standards suggest that this may not be the safest place: [unicode skull and crossbones], [unicode caution sign], [unicode radioactive sign], [unicode biohazard sign]."
This example text was used to produce a story file which has been tried against both Zoom for Mac OS X and Windows Frotz. The Latin, Greek, Cyrillic and Hebrew text all functioned perfectly on both, but a point of difference showed when writing the Hebrew alphabet: Zoom wrote this right-to-left, Windows Frotz left-to-right. The exotic symbols displayed on Zoom (though others not mentioned above, such as "[unicode staff of hermes]", did not): but most appeared only as black squares on Windows Frotz, exceptions being the astrological signs for Venus and Mars and the musical note.