Inform 7 Home Page / Documentation


§5.11. Unicode characters

As we have seen, Inform allows us to type a wide range of characters into the source text, although the more exotic ones may only appear inside quotation marks. But they become more and more difficult to type as they become more obscure. Inform therefore allows us to describe a letter using a text substitution rather than typing it directly.

Unicode characters can be named (or numbered) directly in text. For example:

"[unicode 321]odz Churchyard"

produces a Polish slashed L. If the Unicode Character Names or Unicode Full Character Names extensions are included, characters can also be named as well as numbered:

"[unicode Latin capital letter L with stroke]odz Churchyard"

The Unicode standard assigns character numbers to essentially every marking used in text from any human language: its full range is enormous. (Note that Inform writes these numbers in decimal: many reference charts show them in hexadecimal, or base 16, which can cause confusion.) Inform can only handle codes [unicode 32] up to [unicode 65535], so it is not quite so catholic, but the range is still enormous enough that code numbers are unfamiliar to the eye. Inform therefore allows us to use the official Unicode 4.1 names for characters, instead of their decimal numbers, provided we have Included the necessary extension like so:

paste.png Include Unicode Character Names by Graham Nelson.

This extension provides names for some 2900 of the most commonly used characters. It means, for instance, that we can write text such as:

"Dr Zarkov unveils the new [unicode Hebrew letter alef] Nought drive."
"Omar plays 4[unicode black spade suit] with an air of triumph."

Admittedly, these can get a little verbose:

"[unicode Greek small letter omega with psili and perispomeni and ypogegrammeni]"

But before getting carried away, we should remember the hazards: Inform allows us to type, say, "[unicode Saturn]" (an astrological sign) but it appears only as a black square if the resulting story is played by an interpreter using a font which lacks the relevant sign. For instance, Zoom for OS X uses the Lucida Grande and Apple Symbol fonts by default, and this combination does contain the Saturn sign: but Windows Frotz tends to use the Tahoma font by default, which does not. (Another issue is that the fixed letter spacing font, such as used in the status line, may not contain all the characters that the font of the main text contains.) To write something with truly outré characters is therefore a little chancy: users would have to be told quite carefully what interpreter and font to use to play it.

The "Unicode Character Names" extension, which is pre-installed in the standard distribution of Inform, defines names for the Latin, Greek, Cyrillic, Hebrew and Braille alphabets, together with currency and miscellaneous other symbols, including some for drawing boxes and arrows. It is only optionally installed because even this is quite large: but in case it should still prove inadequate, an alternative can be used:

paste.png Include Unicode Full Character Names by Graham Nelson.

This includes all 12,997 named characters in the 16-bit range of the Unicode 4.1 standard: it is the size of a small novel and its inclusion will slow Inform down. But if you want to experiment with Arabic, ecclesiastical Georgian, Cherokee, Tibetan, Syriac, the International Phonetic Alphabet, hexagrams or the unified Canadian aboriginal syllabics, "Unicode Full Character Names" (again built into Inform) is the extension for you.


arrow-up.png Start of Chapter 5: Text
arrow-left.png Back to §5.10. Accented letters
arrow-right.png Onward to §5.12. Displaying quotations

***ExampleThe Über-complète clavier
This example provides a fairly stringent test of exotic lettering.